kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* List of unaccessible x86 states
@ 2009-10-20 13:01 Jan Kiszka
  2009-10-20 13:10 ` Alexander Graf
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Jan Kiszka @ 2009-10-20 13:01 UTC (permalink / raw)
  To: kvm-devel; +Cc: Avi Kivity, Marcelo Tosatti, Gleb Natapov

Hi all,

as the list of yet user-unaccessible x86 states is a bit volatile ATM,
this is an attempt to collect the precise requirements for additional
state fields. Once everyone feels the list is complete, we can decide
how to partition it into one ore more substates for the new
KVM_GET/SET_VCPU_STATE interface.

What I read so far (or tried to patch already):

- nmi_masked
- nmi_pending
- nmi_injected
- kvm_queued_exception (whole struct content)
- KVM_REQ_TRIPLE_FAULT (from vcpu.requests)

Unclear points (for me) from the last discussion:

- sipi_vector
- MCE (covered via kvm_queued_exception, or does it require more?)

Please extend or correct the list as required.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:01 List of unaccessible x86 states Jan Kiszka
@ 2009-10-20 13:10 ` Alexander Graf
  2009-10-20 13:19   ` Jan Kiszka
  2009-10-20 13:37   ` Jan Kiszka
  2009-10-20 13:35 ` Gleb Natapov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 13:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel, Avi Kivity, Marcelo Tosatti, Gleb Natapov


On 20.10.2009, at 15:01, Jan Kiszka wrote:

> Hi all,
>
> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> this is an attempt to collect the precise requirements for additional
> state fields. Once everyone feels the list is complete, we can decide
> how to partition it into one ore more substates for the new
> KVM_GET/SET_VCPU_STATE interface.
>
> What I read so far (or tried to patch already):
>
> - nmi_masked
> - nmi_pending
> - nmi_injected
> - kvm_queued_exception (whole struct content)
> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>
> Unclear points (for me) from the last discussion:
>
> - sipi_vector
> - MCE (covered via kvm_queued_exception, or does it require more?)
>
> Please extend or correct the list as required.

hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
sync it.

Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:10 ` Alexander Graf
@ 2009-10-20 13:19   ` Jan Kiszka
  2009-10-20 13:27     ` Gleb Natapov
  2009-10-20 13:27     ` Alexander Graf
  2009-10-20 13:37   ` Jan Kiszka
  1 sibling, 2 replies; 40+ messages in thread
From: Jan Kiszka @ 2009-10-20 13:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-devel, Avi Kivity, Marcelo Tosatti, Gleb Natapov

Alexander Graf wrote:
> On 20.10.2009, at 15:01, Jan Kiszka wrote:
> 
>> Hi all,
>>
>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
>> this is an attempt to collect the precise requirements for additional
>> state fields. Once everyone feels the list is complete, we can decide
>> how to partition it into one ore more substates for the new
>> KVM_GET/SET_VCPU_STATE interface.
>>
>> What I read so far (or tried to patch already):
>>
>> - nmi_masked
>> - nmi_pending
>> - nmi_injected
>> - kvm_queued_exception (whole struct content)
>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>
>> Unclear points (for me) from the last discussion:
>>
>> - sipi_vector
>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>
>> Please extend or correct the list as required.
> 
> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
> sync it.

OK. Whole hflags or just the GIF bit?

If we allow access to all bits, can user space cause any problems
(beyond screwing up its guests) by passing weird patterns?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:19   ` Jan Kiszka
@ 2009-10-20 13:27     ` Gleb Natapov
  2009-10-20 13:29       ` Jan Kiszka
  2009-10-20 13:27     ` Alexander Graf
  1 sibling, 1 reply; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 13:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alexander Graf, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 03:19:41PM +0200, Jan Kiszka wrote:
> Alexander Graf wrote:
> > On 20.10.2009, at 15:01, Jan Kiszka wrote:
> > 
> >> Hi all,
> >>
> >> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> >> this is an attempt to collect the precise requirements for additional
> >> state fields. Once everyone feels the list is complete, we can decide
> >> how to partition it into one ore more substates for the new
> >> KVM_GET/SET_VCPU_STATE interface.
> >>
> >> What I read so far (or tried to patch already):
> >>
> >> - nmi_masked
> >> - nmi_pending
> >> - nmi_injected
> >> - kvm_queued_exception (whole struct content)
> >> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>
> >> Unclear points (for me) from the last discussion:
> >>
> >> - sipi_vector
> >> - MCE (covered via kvm_queued_exception, or does it require more?)
> >>
> >> Please extend or correct the list as required.
> > 
> > hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
> > sync it.
> 
> OK. Whole hflags or just the GIF bit?
> 
> If we allow access to all bits, can user space cause any problems
> (beyond screwing up its guests) by passing weird patterns?
> 
HF_NMI_MASK should be migrated too. Destination should enable IRET intercept if
HF_NMI_MASK is set. Or we can assume that migration in the middle of NMI
will never happen :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:19   ` Jan Kiszka
  2009-10-20 13:27     ` Gleb Natapov
@ 2009-10-20 13:27     ` Alexander Graf
  1 sibling, 0 replies; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 13:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel, Avi Kivity, Marcelo Tosatti, Gleb Natapov


On 20.10.2009, at 15:19, Jan Kiszka wrote:

> Alexander Graf wrote:
>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>
>>> Hi all,
>>>
>>> as the list of yet user-unaccessible x86 states is a bit volatile  
>>> ATM,
>>> this is an attempt to collect the precise requirements for  
>>> additional
>>> state fields. Once everyone feels the list is complete, we can  
>>> decide
>>> how to partition it into one ore more substates for the new
>>> KVM_GET/SET_VCPU_STATE interface.
>>>
>>> What I read so far (or tried to patch already):
>>>
>>> - nmi_masked
>>> - nmi_pending
>>> - nmi_injected
>>> - kvm_queued_exception (whole struct content)
>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>
>>> Unclear points (for me) from the last discussion:
>>>
>>> - sipi_vector
>>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>>
>>> Please extend or correct the list as required.
>>
>> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to
>> sync it.
>
> OK. Whole hflags or just the GIF bit?

agraf@busu:~/git/kvm> grep -R HF_ arch/x86/include/asm/*kvm*
arch/x86/include/asm/kvm_host.h:#define HF_GIF_MASK		(1 << 0)
arch/x86/include/asm/kvm_host.h:#define HF_HIF_MASK		(1 << 1)
arch/x86/include/asm/kvm_host.h:#define HF_VINTR_MASK		(1 << 2)
arch/x86/include/asm/kvm_host.h:#define HF_NMI_MASK		(1 << 3)
arch/x86/include/asm/kvm_host.h:#define HF_IRET_MASK		(1 << 4)

I can only talk for GIF here and that should be fine. Not knowing  
about the others does seem like we could get race conditions though.

> If we allow access to all bits, can user space cause any problems
> (beyond screwing up its guests) by passing weird patterns?

IMHO the hflags should be converted between userspace and kernel  
representation. There's a good chance we run older userspace that  
doesn't know about certain flags yet and I'd like to keep the bits as  
flexible as possible.

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:27     ` Gleb Natapov
@ 2009-10-20 13:29       ` Jan Kiszka
  2009-10-20 13:32         ` Gleb Natapov
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2009-10-20 13:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Alexander Graf, kvm-devel, Avi Kivity, Marcelo Tosatti

Gleb Natapov wrote:
> On Tue, Oct 20, 2009 at 03:19:41PM +0200, Jan Kiszka wrote:
>> Alexander Graf wrote:
>>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>>
>>>> Hi all,
>>>>
>>>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
>>>> this is an attempt to collect the precise requirements for additional
>>>> state fields. Once everyone feels the list is complete, we can decide
>>>> how to partition it into one ore more substates for the new
>>>> KVM_GET/SET_VCPU_STATE interface.
>>>>
>>>> What I read so far (or tried to patch already):
>>>>
>>>> - nmi_masked
>>>> - nmi_pending
>>>> - nmi_injected
>>>> - kvm_queued_exception (whole struct content)
>>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>>
>>>> Unclear points (for me) from the last discussion:
>>>>
>>>> - sipi_vector
>>>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>>>
>>>> Please extend or correct the list as required.
>>> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
>>> sync it.
>> OK. Whole hflags or just the GIF bit?
>>
>> If we allow access to all bits, can user space cause any problems
>> (beyond screwing up its guests) by passing weird patterns?
>>
> HF_NMI_MASK should be migrated too. Destination should enable IRET intercept if
> HF_NMI_MASK is set. Or we can assume that migration in the middle of NMI
> will never happen :)

HF_NMI_MASK is redundant to the vendor-agnostic nmi_masked and would
therefore likely be masked out.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:29       ` Jan Kiszka
@ 2009-10-20 13:32         ` Gleb Natapov
  0 siblings, 0 replies; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 13:32 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alexander Graf, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 03:29:38PM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Oct 20, 2009 at 03:19:41PM +0200, Jan Kiszka wrote:
> >> Alexander Graf wrote:
> >>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> >>>> this is an attempt to collect the precise requirements for additional
> >>>> state fields. Once everyone feels the list is complete, we can decide
> >>>> how to partition it into one ore more substates for the new
> >>>> KVM_GET/SET_VCPU_STATE interface.
> >>>>
> >>>> What I read so far (or tried to patch already):
> >>>>
> >>>> - nmi_masked
> >>>> - nmi_pending
> >>>> - nmi_injected
> >>>> - kvm_queued_exception (whole struct content)
> >>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>>>
> >>>> Unclear points (for me) from the last discussion:
> >>>>
> >>>> - sipi_vector
> >>>> - MCE (covered via kvm_queued_exception, or does it require more?)
> >>>>
> >>>> Please extend or correct the list as required.
> >>> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
> >>> sync it.
> >> OK. Whole hflags or just the GIF bit?
> >>
> >> If we allow access to all bits, can user space cause any problems
> >> (beyond screwing up its guests) by passing weird patterns?
> >>
> > HF_NMI_MASK should be migrated too. Destination should enable IRET intercept if
> > HF_NMI_MASK is set. Or we can assume that migration in the middle of NMI
> > will never happen :)
> 
> HF_NMI_MASK is redundant to the vendor-agnostic nmi_masked and would
> therefore likely be masked out.
> 
Correct. We can restore HF_NMI_MASK from nmi_masked.

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:01 List of unaccessible x86 states Jan Kiszka
  2009-10-20 13:10 ` Alexander Graf
@ 2009-10-20 13:35 ` Gleb Natapov
  2009-10-20 18:45 ` Marcelo Tosatti
  2009-10-23 19:34 ` Jan Kiszka
  3 siblings, 0 replies; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 13:35 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 03:01:15PM +0200, Jan Kiszka wrote:
> Hi all,
> 
> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> this is an attempt to collect the precise requirements for additional
> state fields. Once everyone feels the list is complete, we can decide
> how to partition it into one ore more substates for the new
> KVM_GET/SET_VCPU_STATE interface.
> 
> What I read so far (or tried to patch already):
> 
> - nmi_masked
> - nmi_pending
> - nmi_injected
> - kvm_queued_exception (whole struct content)
> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> 
> Unclear points (for me) from the last discussion:
> 
> - sipi_vector
Should be migrated.

> - MCE (covered via kvm_queued_exception, or does it require more?)
> 
> Please extend or correct the list as required.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT SE 2
> Corporate Competence Center Embedded Linux

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:10 ` Alexander Graf
  2009-10-20 13:19   ` Jan Kiszka
@ 2009-10-20 13:37   ` Jan Kiszka
  2009-10-20 13:41     ` Alexander Graf
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2009-10-20 13:37 UTC (permalink / raw)
  To: Alexander Graf, oritw
  Cc: kvm-devel, Avi Kivity, Marcelo Tosatti, Gleb Natapov

Alexander Graf wrote:
> On 20.10.2009, at 15:01, Jan Kiszka wrote:
> 
>> Hi all,
>>
>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
>> this is an attempt to collect the precise requirements for additional
>> state fields. Once everyone feels the list is complete, we can decide
>> how to partition it into one ore more substates for the new
>> KVM_GET/SET_VCPU_STATE interface.
>>
>> What I read so far (or tried to patch already):
>>
>> - nmi_masked
>> - nmi_pending
>> - nmi_injected
>> - kvm_queued_exception (whole struct content)
>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>
>> Unclear points (for me) from the last discussion:
>>
>> - sipi_vector
>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>
>> Please extend or correct the list as required.
> 
> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to  
> sync it.

BTW, GIF is related to svm nesting, right?

Orit, are there any additional states arriving on the vmx side as well
with your nesting patches?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:37   ` Jan Kiszka
@ 2009-10-20 13:41     ` Alexander Graf
  2009-10-20 13:48       ` Gleb Natapov
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 13:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: oritw, kvm-devel, Avi Kivity, Marcelo Tosatti, Gleb Natapov


On 20.10.2009, at 15:37, Jan Kiszka wrote:

> Alexander Graf wrote:
>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>
>>> Hi all,
>>>
>>> as the list of yet user-unaccessible x86 states is a bit volatile  
>>> ATM,
>>> this is an attempt to collect the precise requirements for  
>>> additional
>>> state fields. Once everyone feels the list is complete, we can  
>>> decide
>>> how to partition it into one ore more substates for the new
>>> KVM_GET/SET_VCPU_STATE interface.
>>>
>>> What I read so far (or tried to patch already):
>>>
>>> - nmi_masked
>>> - nmi_pending
>>> - nmi_injected
>>> - kvm_queued_exception (whole struct content)
>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>
>>> Unclear points (for me) from the last discussion:
>>>
>>> - sipi_vector
>>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>>
>>> Please extend or correct the list as required.
>>
>> hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to
>> sync it.
>
> BTW, GIF is related to svm nesting, right?

Yes and no. It's an architecture addition that came with SVM, yes.

The problem is that I don't want to support migrating while in a  
nested VM. We can just #VMEXIT just before migrating with a  
VMEXIT_INTR intercept.

Now just after #VMEXIT we're in a state that's pure host context, but  
has GIF=0. So we need to know about that in userspace to support  
migration.

Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:41     ` Alexander Graf
@ 2009-10-20 13:48       ` Gleb Natapov
  2009-10-20 13:51         ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 13:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
> 
> On 20.10.2009, at 15:37, Jan Kiszka wrote:
> 
> >Alexander Graf wrote:
> >>On 20.10.2009, at 15:01, Jan Kiszka wrote:
> >>
> >>>Hi all,
> >>>
> >>>as the list of yet user-unaccessible x86 states is a bit
> >>>volatile ATM,
> >>>this is an attempt to collect the precise requirements for
> >>>additional
> >>>state fields. Once everyone feels the list is complete, we can
> >>>decide
> >>>how to partition it into one ore more substates for the new
> >>>KVM_GET/SET_VCPU_STATE interface.
> >>>
> >>>What I read so far (or tried to patch already):
> >>>
> >>>- nmi_masked
> >>>- nmi_pending
> >>>- nmi_injected
> >>>- kvm_queued_exception (whole struct content)
> >>>- KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>>
> >>>Unclear points (for me) from the last discussion:
> >>>
> >>>- sipi_vector
> >>>- MCE (covered via kvm_queued_exception, or does it require more?)
> >>>
> >>>Please extend or correct the list as required.
> >>
> >>hflags. Qemu supports GIF, kvm supports GIF, but no side knows how to
> >>sync it.
> >
> >BTW, GIF is related to svm nesting, right?
> 
> Yes and no. It's an architecture addition that came with SVM, yes.
> 
> The problem is that I don't want to support migrating while in a
Why not?

> nested VM. We can just #VMEXIT just before migrating with a
> VMEXIT_INTR intercept.
> 
We don't notify kernel about migration currently. CPU state is migrated
when VM is already paused, how we can exit nested guest at this point?

> Now just after #VMEXIT we're in a state that's pure host context,
> but has GIF=0. So we need to know about that in userspace to support
> migration.
> 
> Alex

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:48       ` Gleb Natapov
@ 2009-10-20 13:51         ` Alexander Graf
  2009-10-20 18:55           ` Gleb Natapov
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 13:51 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti


On 20.10.2009, at 15:48, Gleb Natapov wrote:

> On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
>>
>> On 20.10.2009, at 15:37, Jan Kiszka wrote:
>>
>>> Alexander Graf wrote:
>>>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> as the list of yet user-unaccessible x86 states is a bit
>>>>> volatile ATM,
>>>>> this is an attempt to collect the precise requirements for
>>>>> additional
>>>>> state fields. Once everyone feels the list is complete, we can
>>>>> decide
>>>>> how to partition it into one ore more substates for the new
>>>>> KVM_GET/SET_VCPU_STATE interface.
>>>>>
>>>>> What I read so far (or tried to patch already):
>>>>>
>>>>> - nmi_masked
>>>>> - nmi_pending
>>>>> - nmi_injected
>>>>> - kvm_queued_exception (whole struct content)
>>>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>>>
>>>>> Unclear points (for me) from the last discussion:
>>>>>
>>>>> - sipi_vector
>>>>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>>>>
>>>>> Please extend or correct the list as required.
>>>>
>>>> hflags. Qemu supports GIF, kvm supports GIF, but no side knows  
>>>> how to
>>>> sync it.
>>>
>>> BTW, GIF is related to svm nesting, right?
>>
>> Yes and no. It's an architecture addition that came with SVM, yes.
>>
>> The problem is that I don't want to support migrating while in a
> Why not?

Because then we'd have to transfer the whole host cpu cache and the  
merged intercept bitmaps to userspace as well. That's just too many  
internals to expose IMHO.

>> nested VM. We can just #VMEXIT just before migrating with a
>> VMEXIT_INTR intercept.
>>
> We don't notify kernel about migration currently. CPU state is  
> migrated
> when VM is already paused, how we can exit nested guest at this point?

Hm - introduce a new ioctl? I haven't fully thought it through yet :-).

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:01 List of unaccessible x86 states Jan Kiszka
  2009-10-20 13:10 ` Alexander Graf
  2009-10-20 13:35 ` Gleb Natapov
@ 2009-10-20 18:45 ` Marcelo Tosatti
  2009-10-23 13:08   ` Jan Kiszka
  2009-10-23 19:34 ` Jan Kiszka
  3 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2009-10-20 18:45 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel, Avi Kivity, Gleb Natapov

On Tue, Oct 20, 2009 at 03:01:15PM +0200, Jan Kiszka wrote:
> Hi all,
> 
> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> this is an attempt to collect the precise requirements for additional
> state fields. Once everyone feels the list is complete, we can decide
> how to partition it into one ore more substates for the new
> KVM_GET/SET_VCPU_STATE interface.
> 
> What I read so far (or tried to patch already):
> 
> - nmi_masked
> - nmi_pending
> - nmi_injected
> - kvm_queued_exception (whole struct content)
> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> 
> Unclear points (for me) from the last discussion:
> 
> - sipi_vector
> - MCE (covered via kvm_queued_exception, or does it require more?)

Should save/restore the MCE MSRs (its contents are currently
lost/overwritten AFAICS).

MTRR contents are also dropped.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:51         ` Alexander Graf
@ 2009-10-20 18:55           ` Gleb Natapov
  2009-10-20 18:59             ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 18:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 03:51:02PM +0200, Alexander Graf wrote:
> 
> On 20.10.2009, at 15:48, Gleb Natapov wrote:
> 
> >On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
> >>
> >>On 20.10.2009, at 15:37, Jan Kiszka wrote:
> >>
> >>>Alexander Graf wrote:
> >>>>On 20.10.2009, at 15:01, Jan Kiszka wrote:
> >>>>
> >>>>>Hi all,
> >>>>>
> >>>>>as the list of yet user-unaccessible x86 states is a bit
> >>>>>volatile ATM,
> >>>>>this is an attempt to collect the precise requirements for
> >>>>>additional
> >>>>>state fields. Once everyone feels the list is complete, we can
> >>>>>decide
> >>>>>how to partition it into one ore more substates for the new
> >>>>>KVM_GET/SET_VCPU_STATE interface.
> >>>>>
> >>>>>What I read so far (or tried to patch already):
> >>>>>
> >>>>>- nmi_masked
> >>>>>- nmi_pending
> >>>>>- nmi_injected
> >>>>>- kvm_queued_exception (whole struct content)
> >>>>>- KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>>>>
> >>>>>Unclear points (for me) from the last discussion:
> >>>>>
> >>>>>- sipi_vector
> >>>>>- MCE (covered via kvm_queued_exception, or does it require more?)
> >>>>>
> >>>>>Please extend or correct the list as required.
> >>>>
> >>>>hflags. Qemu supports GIF, kvm supports GIF, but no side
> >>>>knows how to
> >>>>sync it.
> >>>
> >>>BTW, GIF is related to svm nesting, right?
> >>
> >>Yes and no. It's an architecture addition that came with SVM, yes.
> >>
> >>The problem is that I don't want to support migrating while in a
> >Why not?
> 
> Because then we'd have to transfer the whole host cpu cache and the
> merged intercept bitmaps to userspace as well. That's just too many
> internals to expose IMHO.
> 
But the amount of information is constant no matter how l2 guest there
are. Correct? We can expose it as separate substate.

> >>nested VM. We can just #VMEXIT just before migrating with a
> >>VMEXIT_INTR intercept.
> >>
> >We don't notify kernel about migration currently. CPU state is
> >migrated
> >when VM is already paused, how we can exit nested guest at this point?
> 
> Hm - introduce a new ioctl? I haven't fully thought it through yet :-).
> 
There is not software problem that can't be solved by introducing new
ioctl :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 18:55           ` Gleb Natapov
@ 2009-10-20 18:59             ` Alexander Graf
  2009-10-20 19:09               ` Gleb Natapov
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 18:59 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti


On 20.10.2009, at 20:55, Gleb Natapov wrote:

> On Tue, Oct 20, 2009 at 03:51:02PM +0200, Alexander Graf wrote:
>>
>> On 20.10.2009, at 15:48, Gleb Natapov wrote:
>>
>>> On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
>>>>
>>>> On 20.10.2009, at 15:37, Jan Kiszka wrote:
>>>>
>>>>> Alexander Graf wrote:
>>>>>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> as the list of yet user-unaccessible x86 states is a bit
>>>>>>> volatile ATM,
>>>>>>> this is an attempt to collect the precise requirements for
>>>>>>> additional
>>>>>>> state fields. Once everyone feels the list is complete, we can
>>>>>>> decide
>>>>>>> how to partition it into one ore more substates for the new
>>>>>>> KVM_GET/SET_VCPU_STATE interface.
>>>>>>>
>>>>>>> What I read so far (or tried to patch already):
>>>>>>>
>>>>>>> - nmi_masked
>>>>>>> - nmi_pending
>>>>>>> - nmi_injected
>>>>>>> - kvm_queued_exception (whole struct content)
>>>>>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>>>>>
>>>>>>> Unclear points (for me) from the last discussion:
>>>>>>>
>>>>>>> - sipi_vector
>>>>>>> - MCE (covered via kvm_queued_exception, or does it require  
>>>>>>> more?)
>>>>>>>
>>>>>>> Please extend or correct the list as required.
>>>>>>
>>>>>> hflags. Qemu supports GIF, kvm supports GIF, but no side
>>>>>> knows how to
>>>>>> sync it.
>>>>>
>>>>> BTW, GIF is related to svm nesting, right?
>>>>
>>>> Yes and no. It's an architecture addition that came with SVM, yes.
>>>>
>>>> The problem is that I don't want to support migrating while in a
>>> Why not?
>>
>> Because then we'd have to transfer the whole host cpu cache and the
>> merged intercept bitmaps to userspace as well. That's just too many
>> internals to expose IMHO.
>>
> But the amount of information is constant no matter how l2 guest there
> are. Correct? We can expose it as separate substate.

Or we can just not migrate while in a nested guest :-). Which will  
make everything a lot easier.

Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 18:59             ` Alexander Graf
@ 2009-10-20 19:09               ` Gleb Natapov
  2009-10-20 19:23                 ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 19:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 08:59:48PM +0200, Alexander Graf wrote:
> 
> On 20.10.2009, at 20:55, Gleb Natapov wrote:
> 
> >On Tue, Oct 20, 2009 at 03:51:02PM +0200, Alexander Graf wrote:
> >>
> >>On 20.10.2009, at 15:48, Gleb Natapov wrote:
> >>
> >>>On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
> >>>>
> >>>>On 20.10.2009, at 15:37, Jan Kiszka wrote:
> >>>>
> >>>>>Alexander Graf wrote:
> >>>>>>On 20.10.2009, at 15:01, Jan Kiszka wrote:
> >>>>>>
> >>>>>>>Hi all,
> >>>>>>>
> >>>>>>>as the list of yet user-unaccessible x86 states is a bit
> >>>>>>>volatile ATM,
> >>>>>>>this is an attempt to collect the precise requirements for
> >>>>>>>additional
> >>>>>>>state fields. Once everyone feels the list is complete, we can
> >>>>>>>decide
> >>>>>>>how to partition it into one ore more substates for the new
> >>>>>>>KVM_GET/SET_VCPU_STATE interface.
> >>>>>>>
> >>>>>>>What I read so far (or tried to patch already):
> >>>>>>>
> >>>>>>>- nmi_masked
> >>>>>>>- nmi_pending
> >>>>>>>- nmi_injected
> >>>>>>>- kvm_queued_exception (whole struct content)
> >>>>>>>- KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>>>>>>
> >>>>>>>Unclear points (for me) from the last discussion:
> >>>>>>>
> >>>>>>>- sipi_vector
> >>>>>>>- MCE (covered via kvm_queued_exception, or does it
> >>>>>>>require more?)
> >>>>>>>
> >>>>>>>Please extend or correct the list as required.
> >>>>>>
> >>>>>>hflags. Qemu supports GIF, kvm supports GIF, but no side
> >>>>>>knows how to
> >>>>>>sync it.
> >>>>>
> >>>>>BTW, GIF is related to svm nesting, right?
> >>>>
> >>>>Yes and no. It's an architecture addition that came with SVM, yes.
> >>>>
> >>>>The problem is that I don't want to support migrating while in a
> >>>Why not?
> >>
> >>Because then we'd have to transfer the whole host cpu cache and the
> >>merged intercept bitmaps to userspace as well. That's just too many
> >>internals to expose IMHO.
> >>
> >But the amount of information is constant no matter how l2 guest there
> >are. Correct? We can expose it as separate substate.
> 
> Or we can just not migrate while in a nested guest :-). Which will
> make everything a lot easier.
> 
Suppose we have a l2 guest that handles interrupt/nmis by itself how can we
force it to exit? I don't think requesting certain cpu state before
migration is the right thing to do. What if user paused a VM and then
decided to migrate? Or VM was paused automatically because of shortage
of disk space and management want to migrate VM to other host with
bigger disk?

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 19:09               ` Gleb Natapov
@ 2009-10-20 19:23                 ` Alexander Graf
  2009-10-20 19:31                   ` Gleb Natapov
  2009-10-25  9:46                   ` Avi Kivity
  0 siblings, 2 replies; 40+ messages in thread
From: Alexander Graf @ 2009-10-20 19:23 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti


On 20.10.2009, at 21:09, Gleb Natapov wrote:

> On Tue, Oct 20, 2009 at 08:59:48PM +0200, Alexander Graf wrote:
>>
>> On 20.10.2009, at 20:55, Gleb Natapov wrote:
>>
>>> On Tue, Oct 20, 2009 at 03:51:02PM +0200, Alexander Graf wrote:
>>>>
>>>> On 20.10.2009, at 15:48, Gleb Natapov wrote:
>>>>
>>>>> On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
>>>>>>
>>>>>> On 20.10.2009, at 15:37, Jan Kiszka wrote:
>>>>>>
>>>>>>> Alexander Graf wrote:
>>>>>>>> On 20.10.2009, at 15:01, Jan Kiszka wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> as the list of yet user-unaccessible x86 states is a bit
>>>>>>>>> volatile ATM,
>>>>>>>>> this is an attempt to collect the precise requirements for
>>>>>>>>> additional
>>>>>>>>> state fields. Once everyone feels the list is complete, we can
>>>>>>>>> decide
>>>>>>>>> how to partition it into one ore more substates for the new
>>>>>>>>> KVM_GET/SET_VCPU_STATE interface.
>>>>>>>>>
>>>>>>>>> What I read so far (or tried to patch already):
>>>>>>>>>
>>>>>>>>> - nmi_masked
>>>>>>>>> - nmi_pending
>>>>>>>>> - nmi_injected
>>>>>>>>> - kvm_queued_exception (whole struct content)
>>>>>>>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>>>>>>>
>>>>>>>>> Unclear points (for me) from the last discussion:
>>>>>>>>>
>>>>>>>>> - sipi_vector
>>>>>>>>> - MCE (covered via kvm_queued_exception, or does it
>>>>>>>>> require more?)
>>>>>>>>>
>>>>>>>>> Please extend or correct the list as required.
>>>>>>>>
>>>>>>>> hflags. Qemu supports GIF, kvm supports GIF, but no side
>>>>>>>> knows how to
>>>>>>>> sync it.
>>>>>>>
>>>>>>> BTW, GIF is related to svm nesting, right?
>>>>>>
>>>>>> Yes and no. It's an architecture addition that came with SVM,  
>>>>>> yes.
>>>>>>
>>>>>> The problem is that I don't want to support migrating while in a
>>>>> Why not?
>>>>
>>>> Because then we'd have to transfer the whole host cpu cache and the
>>>> merged intercept bitmaps to userspace as well. That's just too many
>>>> internals to expose IMHO.
>>>>
>>> But the amount of information is constant no matter how l2 guest  
>>> there
>>> are. Correct? We can expose it as separate substate.
>>
>> Or we can just not migrate while in a nested guest :-). Which will
>> make everything a lot easier.
>>
> Suppose we have a l2 guest that handles interrupt/nmis by itself how  
> can we
> force it to exit?

If the nested hypervisor doesn't intercept INTR we don't support it  
anyways.

> I don't think requesting certain cpu state before
> migration is the right thing to do. What if user paused a VM and then
> decided to migrate?

So pausing has to make it go out of nested guest context too?
Then we're not in the nested guest context, right? :)

> Or VM was paused automatically because of shortage
> of disk space and management want to migrate VM to other host with
> bigger disk?

Same as before.


Really, pushing the whole nesting state over is not a good idea.

Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 19:23                 ` Alexander Graf
@ 2009-10-20 19:31                   ` Gleb Natapov
  2009-10-25  9:46                   ` Avi Kivity
  1 sibling, 0 replies; 40+ messages in thread
From: Gleb Natapov @ 2009-10-20 19:31 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jan Kiszka, oritw, kvm-devel, Avi Kivity, Marcelo Tosatti

On Tue, Oct 20, 2009 at 09:23:22PM +0200, Alexander Graf wrote:
> 
> On 20.10.2009, at 21:09, Gleb Natapov wrote:
> 
> >On Tue, Oct 20, 2009 at 08:59:48PM +0200, Alexander Graf wrote:
> >>
> >>On 20.10.2009, at 20:55, Gleb Natapov wrote:
> >>
> >>>On Tue, Oct 20, 2009 at 03:51:02PM +0200, Alexander Graf wrote:
> >>>>
> >>>>On 20.10.2009, at 15:48, Gleb Natapov wrote:
> >>>>
> >>>>>On Tue, Oct 20, 2009 at 03:41:57PM +0200, Alexander Graf wrote:
> >>>>>>
> >>>>>>On 20.10.2009, at 15:37, Jan Kiszka wrote:
> >>>>>>
> >>>>>>>Alexander Graf wrote:
> >>>>>>>>On 20.10.2009, at 15:01, Jan Kiszka wrote:
> >>>>>>>>
> >>>>>>>>>Hi all,
> >>>>>>>>>
> >>>>>>>>>as the list of yet user-unaccessible x86 states is a bit
> >>>>>>>>>volatile ATM,
> >>>>>>>>>this is an attempt to collect the precise requirements for
> >>>>>>>>>additional
> >>>>>>>>>state fields. Once everyone feels the list is complete, we can
> >>>>>>>>>decide
> >>>>>>>>>how to partition it into one ore more substates for the new
> >>>>>>>>>KVM_GET/SET_VCPU_STATE interface.
> >>>>>>>>>
> >>>>>>>>>What I read so far (or tried to patch already):
> >>>>>>>>>
> >>>>>>>>>- nmi_masked
> >>>>>>>>>- nmi_pending
> >>>>>>>>>- nmi_injected
> >>>>>>>>>- kvm_queued_exception (whole struct content)
> >>>>>>>>>- KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>>>>>>>>
> >>>>>>>>>Unclear points (for me) from the last discussion:
> >>>>>>>>>
> >>>>>>>>>- sipi_vector
> >>>>>>>>>- MCE (covered via kvm_queued_exception, or does it
> >>>>>>>>>require more?)
> >>>>>>>>>
> >>>>>>>>>Please extend or correct the list as required.
> >>>>>>>>
> >>>>>>>>hflags. Qemu supports GIF, kvm supports GIF, but no side
> >>>>>>>>knows how to
> >>>>>>>>sync it.
> >>>>>>>
> >>>>>>>BTW, GIF is related to svm nesting, right?
> >>>>>>
> >>>>>>Yes and no. It's an architecture addition that came with
> >>>>>>SVM, yes.
> >>>>>>
> >>>>>>The problem is that I don't want to support migrating while in a
> >>>>>Why not?
> >>>>
> >>>>Because then we'd have to transfer the whole host cpu cache and the
> >>>>merged intercept bitmaps to userspace as well. That's just too many
> >>>>internals to expose IMHO.
> >>>>
> >>>But the amount of information is constant no matter how l2
> >>>guest there
> >>>are. Correct? We can expose it as separate substate.
> >>
> >>Or we can just not migrate while in a nested guest :-). Which will
> >>make everything a lot easier.
> >>
> >Suppose we have a l2 guest that handles interrupt/nmis by itself
> >how can we
> >force it to exit?
> 
> If the nested hypervisor doesn't intercept INTR we don't support it
> anyways.
> 
Why? I looked at the code briefly and it looks like we just inject
interrupt as usual instead of do nested exit if l2 does not intercept
INTR. Have I miss interpreted the code. Even if I have why not support
it?

> >I don't think requesting certain cpu state before
> >migration is the right thing to do. What if user paused a VM and then
> >decided to migrate?
> 
> So pausing has to make it go out of nested guest context too?
Probably.

> Then we're not in the nested guest context, right? :)
> 
> >Or VM was paused automatically because of shortage
> >of disk space and management want to migrate VM to other host with
> >bigger disk?
> 
> Same as before.
What do you mean?

> 
> 
> Really, pushing the whole nesting state over is not a good idea.
> 
May be just disallow migration with nested guest running then? Cross
vendor migration is not possible anyway.

--
			Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 18:45 ` Marcelo Tosatti
@ 2009-10-23 13:08   ` Jan Kiszka
  2009-10-23 17:00     ` Marcelo Tosatti
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2009-10-23 13:08 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Avi Kivity, Gleb Natapov

Marcelo Tosatti wrote:
> On Tue, Oct 20, 2009 at 03:01:15PM +0200, Jan Kiszka wrote:
>> Hi all,
>>
>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
>> this is an attempt to collect the precise requirements for additional
>> state fields. Once everyone feels the list is complete, we can decide
>> how to partition it into one ore more substates for the new
>> KVM_GET/SET_VCPU_STATE interface.
>>
>> What I read so far (or tried to patch already):
>>
>> - nmi_masked
>> - nmi_pending
>> - nmi_injected
>> - kvm_queued_exception (whole struct content)
>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>
>> Unclear points (for me) from the last discussion:
>>
>> - sipi_vector
>> - MCE (covered via kvm_queued_exception, or does it require more?)
> 
> Should save/restore the MCE MSRs (its contents are currently
> lost/overwritten AFAICS).
> 
> MTRR contents are also dropped.

Hmm, the code path is winding, but aren't they already available to user
space via GET/SET_MSRS?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-23 13:08   ` Jan Kiszka
@ 2009-10-23 17:00     ` Marcelo Tosatti
  2009-10-23 19:26       ` Jan Kiszka
  0 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2009-10-23 17:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel, Avi Kivity, Gleb Natapov

On Fri, Oct 23, 2009 at 03:08:21PM +0200, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Tue, Oct 20, 2009 at 03:01:15PM +0200, Jan Kiszka wrote:
> >> Hi all,
> >>
> >> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> >> this is an attempt to collect the precise requirements for additional
> >> state fields. Once everyone feels the list is complete, we can decide
> >> how to partition it into one ore more substates for the new
> >> KVM_GET/SET_VCPU_STATE interface.
> >>
> >> What I read so far (or tried to patch already):
> >>
> >> - nmi_masked
> >> - nmi_pending
> >> - nmi_injected
> >> - kvm_queued_exception (whole struct content)
> >> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> >>
> >> Unclear points (for me) from the last discussion:
> >>
> >> - sipi_vector
> >> - MCE (covered via kvm_queued_exception, or does it require more?)
> > 
> > Should save/restore the MCE MSRs (its contents are currently
> > lost/overwritten AFAICS).
> > 
> > MTRR contents are also dropped.
> 
> Hmm, the code path is winding, but aren't they already available to user
> space via GET/SET_MSRS?

Yes, nevermind, irrelevant to the current discussion.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-23 17:00     ` Marcelo Tosatti
@ 2009-10-23 19:26       ` Jan Kiszka
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Kiszka @ 2009-10-23 19:26 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Avi Kivity, Gleb Natapov

[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

Marcelo Tosatti wrote:
> On Fri, Oct 23, 2009 at 03:08:21PM +0200, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Tue, Oct 20, 2009 at 03:01:15PM +0200, Jan Kiszka wrote:
>>>> Hi all,
>>>>
>>>> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
>>>> this is an attempt to collect the precise requirements for additional
>>>> state fields. Once everyone feels the list is complete, we can decide
>>>> how to partition it into one ore more substates for the new
>>>> KVM_GET/SET_VCPU_STATE interface.
>>>>
>>>> What I read so far (or tried to patch already):
>>>>
>>>> - nmi_masked
>>>> - nmi_pending
>>>> - nmi_injected
>>>> - kvm_queued_exception (whole struct content)
>>>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>>>
>>>> Unclear points (for me) from the last discussion:
>>>>
>>>> - sipi_vector
>>>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>> Should save/restore the MCE MSRs (its contents are currently
>>> lost/overwritten AFAICS).
>>>
>>> MTRR contents are also dropped.
>> Hmm, the code path is winding, but aren't they already available to user
>> space via GET/SET_MSRS?
> 
> Yes, nevermind, irrelevant to the current discussion.
> 

Oh, then I misunderstood your original reply as "we need to add them to
the list as well". Even better.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 13:01 List of unaccessible x86 states Jan Kiszka
                   ` (2 preceding siblings ...)
  2009-10-20 18:45 ` Marcelo Tosatti
@ 2009-10-23 19:34 ` Jan Kiszka
  2009-10-24 10:35   ` Alexander Graf
  3 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2009-10-23 19:34 UTC (permalink / raw)
  To: kvm-devel; +Cc: Avi Kivity, Marcelo Tosatti, Gleb Natapov, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]

Jan Kiszka wrote:
> Hi all,
> 
> as the list of yet user-unaccessible x86 states is a bit volatile ATM,
> this is an attempt to collect the precise requirements for additional
> state fields. Once everyone feels the list is complete, we can decide
> how to partition it into one ore more substates for the new
> KVM_GET/SET_VCPU_STATE interface.
> 
> What I read so far (or tried to patch already):
> 
> - nmi_masked
> - nmi_pending
> - nmi_injected
> - kvm_queued_exception (whole struct content)
> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
> 
> Unclear points (for me) from the last discussion:
> 
> - sipi_vector
> - MCE (covered via kvm_queued_exception, or does it require more?)
> 
> Please extend or correct the list as required.
> 

Here is a wrap-up of what has been reported so far:

 - NMI
    o nmi_masked
    o nmi_pending
    o nmi_injected
 - queued exception
    o kvm_queued_exception
    o triple_fault
 - SVM
    o gif
    (Are we sure that there is really nothing more here?)
 - sipi_vector

So the next question is how to map these on substates. I'm currently
leaning towards this organization:

 - KVM_X86_VCPU_STATE_EVENTS
    o NMI states
    o pending exception
    o sipi_vector
    o pending interrupt?
      (would be redundant to kvm_sregs.interrupt_bitmap, but that struct
      may be obsoleted one day)
 - KVM_X86_VCPU_STATE_SVM
    o gif

Any concerns or better suggestions?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-23 19:34 ` Jan Kiszka
@ 2009-10-24 10:35   ` Alexander Graf
  2009-10-25  9:49     ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-24 10:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm-devel list, Avi Kivity, Marcelo Tosatti, Gleb Natapov,
	Joerg Roedel


On 23.10.2009, at 21:34, Jan Kiszka wrote:

> Jan Kiszka wrote:
>> Hi all,
>>
>> as the list of yet user-unaccessible x86 states is a bit volatile  
>> ATM,
>> this is an attempt to collect the precise requirements for additional
>> state fields. Once everyone feels the list is complete, we can decide
>> how to partition it into one ore more substates for the new
>> KVM_GET/SET_VCPU_STATE interface.
>>
>> What I read so far (or tried to patch already):
>>
>> - nmi_masked
>> - nmi_pending
>> - nmi_injected
>> - kvm_queued_exception (whole struct content)
>> - KVM_REQ_TRIPLE_FAULT (from vcpu.requests)
>>
>> Unclear points (for me) from the last discussion:
>>
>> - sipi_vector
>> - MCE (covered via kvm_queued_exception, or does it require more?)
>>
>> Please extend or correct the list as required.
>>
>
> Here is a wrap-up of what has been reported so far:
>
> - NMI
>    o nmi_masked
>    o nmi_pending
>    o nmi_injected
> - queued exception
>    o kvm_queued_exception
>    o triple_fault
> - SVM
>    o gif
>    (Are we sure that there is really nothing more here?)

Hm, thinking about this again, it might be useful to have an  
"currently in nested VM" flag here. That way userspace can decide if  
it needs to get out of the nested state (for migration) or if it just  
doesn't care.

> - sipi_vector
>
> So the next question is how to map these on substates. I'm currently
> leaning towards this organization:
>
> - KVM_X86_VCPU_STATE_EVENTS
>    o NMI states
>    o pending exception
>    o sipi_vector
>    o pending interrupt?
>      (would be redundant to kvm_sregs.interrupt_bitmap, but that  
> struct
>      may be obsoleted one day)
> - KVM_X86_VCPU_STATE_SVM
>    o gif

Can we make this an "svm_flags" or so u32? And then we'd just set bits?

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-20 19:23                 ` Alexander Graf
  2009-10-20 19:31                   ` Gleb Natapov
@ 2009-10-25  9:46                   ` Avi Kivity
  2009-10-25 13:53                     ` Alexander Graf
  1 sibling, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-25  9:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Jan Kiszka, oritw, kvm-devel, Marcelo Tosatti

On 10/20/2009 09:23 PM, Alexander Graf wrote:
>
> If the nested hypervisor doesn't intercept INTR we don't support it 
> anyways.

That's a bug.

> Really, pushing the whole nesting state over is not a good idea.

Isn't the entire state just one bit?  Everything else should be saved to 
guest memory.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-24 10:35   ` Alexander Graf
@ 2009-10-25  9:49     ` Avi Kivity
  2009-10-26  9:17       ` Joerg Roedel
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-25  9:49 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jan Kiszka, kvm-devel list, Marcelo Tosatti, Gleb Natapov,
	Joerg Roedel

On 10/24/2009 12:35 PM, Alexander Graf wrote:
>
> Hm, thinking about this again, it might be useful to have an 
> "currently in nested VM" flag here. That way userspace can decide if 
> it needs to get out of the nested state (for migration) or if it just 
> doesn't care.

Getting out of nested state involves modifying state (both memory and 
registers).  Nor can we in the general case force it.  The guest can set 
up a situation where it is impossible to #vmexit.

>> - KVM_X86_VCPU_STATE_SVM
>>    o gif
>
> Can we make this an "svm_flags" or so u32? And then we'd just set bits?
>

Or individual flags as u8s, so we don't get trapped into a specific 
encoding which is really an implementation detail.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-25  9:46                   ` Avi Kivity
@ 2009-10-25 13:53                     ` Alexander Graf
  2009-10-25 14:08                       ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-25 13:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti


Am 25.10.2009 um 10:46 schrieb Avi Kivity <avi@redhat.com>:

> On 10/20/2009 09:23 PM, Alexander Graf wrote:
>>
>> If the nested hypervisor doesn't intercept INTR we don't support it  
>> anyways.
>
> That's a bug.

It's a question of how accurate we want to be.

>
>> Really, pushing the whole nesting state over is not a good idea.
>
> Isn't the entire state just one bit?  Everything else should be  
> saved to guest memory.

It's not. We can't use the guest memory for hsave because then the  
guest could break the l1 state, so a malicious hypervisor could break  
us.

Alex

>
> -- 
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-25 13:53                     ` Alexander Graf
@ 2009-10-25 14:08                       ` Avi Kivity
  2009-10-25 16:45                         ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-25 14:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti

On 10/25/2009 03:53 PM, Alexander Graf wrote:
>
> Am 25.10.2009 um 10:46 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 10/20/2009 09:23 PM, Alexander Graf wrote:
>>>
>>> If the nested hypervisor doesn't intercept INTR we don't support it 
>>> anyways.
>>
>> That's a bug.
>
> It's a question of how accurate we want to be.

Even if we don't implement it immediately, it's still a bug.  It won't 
matter much until we hit a guest that needs it.

>>> Really, pushing the whole nesting state over is not a good idea.
>>
>> Isn't the entire state just one bit?  Everything else should be saved 
>> to guest memory.
>
> It's not. We can't use the guest memory for hsave because then the 
> guest could break the l1 state, so a malicious hypervisor could break us.

Guest hsave should be used for storing guest state when switching into 
the nested guest, not host state.  Host state is not part of the 
save/restore state in any case.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-25 14:08                       ` Avi Kivity
@ 2009-10-25 16:45                         ` Alexander Graf
  2009-10-26  8:33                           ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-25 16:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti


Am 25.10.2009 um 15:08 schrieb Avi Kivity <avi@redhat.com>:

> On 10/25/2009 03:53 PM, Alexander Graf wrote:
>>
>> Am 25.10.2009 um 10:46 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 10/20/2009 09:23 PM, Alexander Graf wrote:
>>>>
>>>> If the nested hypervisor doesn't intercept INTR we don't support  
>>>> it anyways.
>>>
>>> That's a bug.
>>
>> It's a question of how accurate we want to be.
>
> Even if we don't implement it immediately, it's still a bug.  It  
> won't matter much until we hit a guest that needs it.
>
>>>> Really, pushing the whole nesting state over is not a good idea.
>>>
>>> Isn't the entire state just one bit?  Everything else should be  
>>> saved to guest memory.
>>
>> It's not. We can't use the guest memory for hsave because then the  
>> guest could break the l1 state, so a malicious hypervisor could  
>> break us.
>
> Guest hsave should be used for storing guest state when switching  
> into the nested guest, not host state.  Host state is not part of  
> the save/restore state in any case.

No it's not.

When going in an l2 guest, we need to save the l1 state in the hsave.  
Now if we'd use the l1 given hsave, the l2 guest could modify the hsave.

That means the l2 guest could rewrite the intercept bitmap to 0 and  
compromize the host.

That's why we're storing the hsave data in a host allocated page.

Of course, we could save the whole hsave are off to the host on  
migeation...

Alex
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-25 16:45                         ` Alexander Graf
@ 2009-10-26  8:33                           ` Avi Kivity
  2009-10-26  9:11                             ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-26  8:33 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti

On 10/25/2009 06:45 PM, Alexander Graf wrote:
>>> It's not. We can't use the guest memory for hsave because then the 
>>> guest could break the l1 state, so a malicious hypervisor could 
>>> break us.
>>
>> Guest hsave should be used for storing guest state when switching 
>> into the nested guest, not host state.  Host state is not part of the 
>> save/restore state in any case.
>
>
> No it's not.
>
> When going in an l2 guest, we need to save the l1 state in the hsave. 
> Now if we'd use the l1 given hsave, the l2 guest could modify the hsave.
>
> That means the l2 guest could rewrite the intercept bitmap to 0 and 
> compromize the host.

L1 hsave stores the architected state saved by vmrun, e.g. cs.sel, 
next_rip, cr0, cr3, etc.  The host intercept bitmap is not state since 
it is calculated from the L1 intercept bitmap and host code.  Indeed it 
can be different from host to host even with the same guest state.

> That's why we're storing the hsave data in a host allocated page.
>
> Of course, we could save the whole hsave are off to the host on 
> migeation...

Sorry, -ENOPARSE.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  8:33                           ` Avi Kivity
@ 2009-10-26  9:11                             ` Alexander Graf
  2009-10-26  9:19                               ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2009-10-26  9:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti


Am 26.10.2009 um 09:33 schrieb Avi Kivity <avi@redhat.com>:

> On 10/25/2009 06:45 PM, Alexander Graf wrote:
>>>> It's not. We can't use the guest memory for hsave because then  
>>>> the guest could break the l1 state, so a malicious hypervisor  
>>>> could break us.
>>>
>>> Guest hsave should be used for storing guest state when switching  
>>> into the nested guest, not host state.  Host state is not part of  
>>> the save/restore state in any case.
>>
>>
>> No it's not.
>>
>> When going in an l2 guest, we need to save the l1 state in the  
>> hsave. Now if we'd use the l1 given hsave, the l2 guest could  
>> modify the hsave.
>>
>> That means the l2 guest could rewrite the intercept bitmap to 0 and  
>> compromize the host.
>
> L1 hsave stores the architected state saved by vmrun, e.g. cs.sel,  
> next_rip, cr0, cr3, etc.  The host intercept bitmap is not state  
> since it is calculated from the L1 intercept bitmap and host code.   
> Indeed it can be different from host to host even with the same  
> guest state.

Ah, so you'd only save off the cpu state parts of the vmcb.

Currently we save off control parts too, so we can easily swap them in  
on #vmexit.

So if we'd migrate off when inside the nested guest, we'd have to save  
off the resume control state, OR them again with the guest vmcb  
control states and be inside the nested guest.

Wouldn't it be much easier to not migrate / save state when inside a  
nested guest? I'm afraid the code will become overly complex if we do  
allow migration while in a nested context.

Alex
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-25  9:49     ` Avi Kivity
@ 2009-10-26  9:17       ` Joerg Roedel
  2009-10-26  9:21         ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Joerg Roedel @ 2009-10-26  9:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
> On 10/24/2009 12:35 PM, Alexander Graf wrote:
> >
> >Hm, thinking about this again, it might be useful to have an
> >"currently in nested VM" flag here. That way userspace can decide
> >if it needs to get out of the nested state (for migration) or if
> >it just doesn't care.
> 
> Getting out of nested state involves modifying state (both memory
> and registers).  Nor can we in the general case force it.  The guest
> can set up a situation where it is impossible to #vmexit.

There is actually more than that. If the guest runs in guest mode itself
we also need to report the host state to be able to do an #vmexit after
migration.
In nested SVM the host state is not saved in the guest memory to prevent
the guest from modifying it and break out of its virtualization jail.

	Joerg



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:11                             ` Alexander Graf
@ 2009-10-26  9:19                               ` Avi Kivity
  0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2009-10-26  9:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Jan Kiszka, oritw@il.ibm.com, kvm-devel,
	Marcelo Tosatti

On 10/26/2009 11:11 AM, Alexander Graf wrote:
>> L1 hsave stores the architected state saved by vmrun, e.g. cs.sel, 
>> next_rip, cr0, cr3, etc.  The host intercept bitmap is not state 
>> since it is calculated from the L1 intercept bitmap and host code.  
>> Indeed it can be different from host to host even with the same guest 
>> state.
>
>
> Ah, so you'd only save off the cpu state parts of the vmcb.
>
> Currently we save off control parts too, so we can easily swap them in 
> on #vmexit.

These can still be saved in a host memory area as an optimization, and 
regenerated if needed.

> So if we'd migrate off when inside the nested guest, we'd have to save 
> off the resume control state, OR them again with the guest vmcb 
> control states and be inside the nested guest.

Right, if the new state bit (guest mode) is set, we look at the control 
bits and OR them into the vmcb.  That part can be reused with the VMRUN 
code.

>
> Wouldn't it be much easier to not migrate / save state when inside a 
> nested guest? I'm afraid the code will become overly complex if we do 
> allow migration while in a nested context.

I can't really see why but then I don't know the code as well as you 
do.  The current code won't work for guests which don't intercept 
external interrupts (probably only malware).  For nested vmx it may be 
necessary since vmx has a mode where interrupts are acknowledged during 
#VMEXIT and the interrupt vector is saved into a register; you can't 
fake an interrupt #VMEXIT since you can't fake the vector.  Xen is one 
guest which uses this mode.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:17       ` Joerg Roedel
@ 2009-10-26  9:21         ` Avi Kivity
  2009-10-26  9:30           ` Joerg Roedel
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-26  9:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On 10/26/2009 11:17 AM, Joerg Roedel wrote:
> On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
>    
>> On 10/24/2009 12:35 PM, Alexander Graf wrote:
>>      
>>> Hm, thinking about this again, it might be useful to have an
>>> "currently in nested VM" flag here. That way userspace can decide
>>> if it needs to get out of the nested state (for migration) or if
>>> it just doesn't care.
>>>        
>> Getting out of nested state involves modifying state (both memory
>> and registers).  Nor can we in the general case force it.  The guest
>> can set up a situation where it is impossible to #vmexit.
>>      
> There is actually more than that. If the guest runs in guest mode itself
> we also need to report the host state to be able to do an #vmexit after
> migration.
> In nested SVM the host state is not saved in the guest memory to prevent
> the guest from modifying it and break out of its virtualization jail.
>    

Which host state?  As far as I can tell, it can all be regenerated.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:21         ` Avi Kivity
@ 2009-10-26  9:30           ` Joerg Roedel
  2009-10-26  9:39             ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Joerg Roedel @ 2009-10-26  9:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On Mon, Oct 26, 2009 at 11:21:12AM +0200, Avi Kivity wrote:
> On 10/26/2009 11:17 AM, Joerg Roedel wrote:
> >On Sun, Oct 25, 2009 at 11:49:35AM +0200, Avi Kivity wrote:
> >>On 10/24/2009 12:35 PM, Alexander Graf wrote:
> >>>Hm, thinking about this again, it might be useful to have an
> >>>"currently in nested VM" flag here. That way userspace can decide
> >>>if it needs to get out of the nested state (for migration) or if
> >>>it just doesn't care.
> >>Getting out of nested state involves modifying state (both memory
> >>and registers).  Nor can we in the general case force it.  The guest
> >>can set up a situation where it is impossible to #vmexit.
> >There is actually more than that. If the guest runs in guest mode itself
> >we also need to report the host state to be able to do an #vmexit after
> >migration.
> >In nested SVM the host state is not saved in the guest memory to prevent
> >the guest from modifying it and break out of its virtualization jail.
> 
> Which host state?  As far as I can tell, it can all be regenerated.

The state which is loaded into the vcpu when a #vmexit is emulated. This
includes segments, control registers and the host rip for example.

	Joerg



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:30           ` Joerg Roedel
@ 2009-10-26  9:39             ` Avi Kivity
  2009-10-26  9:56               ` Joerg Roedel
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-26  9:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On 10/26/2009 11:30 AM, Joerg Roedel wrote:
>
>> Which host state?  As far as I can tell, it can all be regenerated.
>>      
> The state which is loaded into the vcpu when a #vmexit is emulated. This
> includes segments, control registers and the host rip for example.
>    

All of this state does not change between nested guest and normal guest 
mode.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:39             ` Avi Kivity
@ 2009-10-26  9:56               ` Joerg Roedel
  2009-10-26 10:09                 ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Joerg Roedel @ 2009-10-26  9:56 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
> On 10/26/2009 11:30 AM, Joerg Roedel wrote:
> >
> >>Which host state?  As far as I can tell, it can all be regenerated.
> >The state which is loaded into the vcpu when a #vmexit is emulated. This
> >includes segments, control registers and the host rip for example.
> 
> All of this state does not change between nested guest and normal
> guest mode.

I am talking about all the state that is saved in svm->nested.hsave.
When we migrate a guest vcpu while it is running in guest mode itself
(without forcing a nested #vmexit) this state is required when a #vmexit
needs to be emulated on this vcpu after migration.
Same is true for the nested intercept conditions.

	Joerg



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26  9:56               ` Joerg Roedel
@ 2009-10-26 10:09                 ` Avi Kivity
  2009-10-26 10:45                   ` Joerg Roedel
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-26 10:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On 10/26/2009 11:56 AM, Joerg Roedel wrote:
> On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
>    
>> On 10/26/2009 11:30 AM, Joerg Roedel wrote:
>>      
>>>        
>>>> Which host state?  As far as I can tell, it can all be regenerated.
>>>>          
>>> The state which is loaded into the vcpu when a #vmexit is emulated. This
>>> includes segments, control registers and the host rip for example.
>>>        
>> All of this state does not change between nested guest and normal
>> guest mode.
>>      
> I am talking about all the state that is saved in svm->nested.hsave.
> When we migrate a guest vcpu while it is running in guest mode itself
> (without forcing a nested #vmexit) this state is required when a #vmexit
> needs to be emulated on this vcpu after migration.
> Same is true for the nested intercept conditions.
>    

The state that is saved by VMRUN can be saved to guest memory and 
migrated.  Extra state (like the intercepts for the previous mode) must 
be saved to host memory and not migrated; host intercepts can be 
regenerated.

Concretely:


     hsave->save.es     = vmcb->save.es;
     hsave->save.cs     = vmcb->save.cs;
     hsave->save.ss     = vmcb->save.ss;
     hsave->save.ds     = vmcb->save.ds;
     hsave->save.gdtr   = vmcb->save.gdtr;
     hsave->save.idtr   = vmcb->save.idtr;
     hsave->save.efer   = svm->vcpu.arch.shadow_efer;
     hsave->save.cr0    = svm->vcpu.arch.cr0;
     hsave->save.cr4    = svm->vcpu.arch.cr4;
     hsave->save.rflags = vmcb->save.rflags;
     hsave->save.rip    = svm->next_rip;
     hsave->save.rsp    = vmcb->save.rsp;
     hsave->save.rax    = vmcb->save.rax;
     if (npt_enabled)
         hsave->save.cr3    = vmcb->save.cr3;
     else
         hsave->save.cr3    = svm->vcpu.arch.cr3;


Can all be saved to guest memory.

     copy_vmcb_control_area(hsave, vmcb);

Must not be saved into guest memory.  On the other hand, it is not 
needed for migration.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26 10:09                 ` Avi Kivity
@ 2009-10-26 10:45                   ` Joerg Roedel
  2009-10-26 10:56                     ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Joerg Roedel @ 2009-10-26 10:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On Mon, Oct 26, 2009 at 12:09:25PM +0200, Avi Kivity wrote:
> On 10/26/2009 11:56 AM, Joerg Roedel wrote:
> >On Mon, Oct 26, 2009 at 11:39:46AM +0200, Avi Kivity wrote:
> >>On 10/26/2009 11:30 AM, Joerg Roedel wrote:
> >>>>Which host state?  As far as I can tell, it can all be regenerated.
> >>>The state which is loaded into the vcpu when a #vmexit is emulated. This
> >>>includes segments, control registers and the host rip for example.
> >>All of this state does not change between nested guest and normal
> >>guest mode.
> >I am talking about all the state that is saved in svm->nested.hsave.
> >When we migrate a guest vcpu while it is running in guest mode itself
> >(without forcing a nested #vmexit) this state is required when a #vmexit
> >needs to be emulated on this vcpu after migration.
> >Same is true for the nested intercept conditions.
> 
> The state that is saved by VMRUN can be saved to guest memory and
> migrated.  Extra state (like the intercepts for the previous mode)
> must be saved to host memory and not migrated; host intercepts can
> be regenerated.

Ok, parts of the state can be saved in guest memory. But thats
currently not done. This will need some care to not introduce a security
hole. But it shouldn't be too difficult.
The state thats not reproducible in an sane way is the intercept bitmap
for the l2 guest.
>From the nested state what needs to be exposed to userspace for
migration is:

* guest mode flag (as returned by is_nested)
* nested vmcb address
* nested hsave msr
* nested intercepts
* for nested nested paging: guest nested cr3 value

Another state which needs exposure is the last branch record related
state.

Off-topic question: Will the new migration protocol include some kind
               handshake to find out if migration is possible at all?

	Joerg



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26 10:45                   ` Joerg Roedel
@ 2009-10-26 10:56                     ` Avi Kivity
  2009-10-26 11:10                       ` Joerg Roedel
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2009-10-26 10:56 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On 10/26/2009 12:45 PM, Joerg Roedel wrote:
>
> Ok, parts of the state can be saved in guest memory. But thats
> currently not done. This will need some care to not introduce a security
> hole. But it shouldn't be too difficult.
> The state thats not reproducible in an sane way is the intercept bitmap
> for the l2 guest.
>  From the nested state what needs to be exposed to userspace for
> migration is:
>
> * guest mode flag (as returned by is_nested)
> * nested vmcb address
>    

Yes, forgot that.  We can store it in the hsave area (note the hsave 
area format becomes an ABI).

> * nested hsave msr
>    

That's already saved.

> * nested intercepts
>    

These are part of the guest vmcb.  The host nested intercepts can be 
recalculated, no?

> * for nested nested paging: guest nested cr3 value
>    

Part of the guest vmcb.

> Another state which needs exposure is the last branch record related
> state.
>    

Aren't those just more MSRs?

> Off-topic question: Will the new migration protocol include some kind
>                 handshake to find out if migration is possible at all?
>
>    

It's assumed that migration always works for a newer qemu version, and 
that the management tools don't attempt backward migration.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: List of unaccessible x86 states
  2009-10-26 10:56                     ` Avi Kivity
@ 2009-10-26 11:10                       ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2009-10-26 11:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Jan Kiszka, kvm-devel list, Marcelo Tosatti,
	Gleb Natapov

On Mon, Oct 26, 2009 at 12:56:31PM +0200, Avi Kivity wrote:
> On 10/26/2009 12:45 PM, Joerg Roedel wrote:


> >* nested intercepts
> 
> These are part of the guest vmcb.  The host nested intercepts can be
> recalculated, no?
> 
> >* for nested nested paging: guest nested cr3 value
> 
> Part of the guest vmcb.

This will work is most cases. But its not architecturally sane because
real hardware caches this information in the cpu. So software is free to
modify the vmcb without impacting the in-cpu state until the next
#vmexit. I don't know any software which relies on that so it may be not
an issue.
 
> >Off-topic question: Will the new migration protocol include some kind
> >                handshake to find out if migration is possible at all?
> >
> 
> It's assumed that migration always works for a newer qemu version,
> and that the management tools don't attempt backward migration.

I think such a handshake would make sense to just prevent that a nested
svm hypervisor is migrated to an intel machine or vice versa (just an
example, there are more like sse*, nested nested paging, ...).

	Joerg



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2009-10-26 11:10 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-20 13:01 List of unaccessible x86 states Jan Kiszka
2009-10-20 13:10 ` Alexander Graf
2009-10-20 13:19   ` Jan Kiszka
2009-10-20 13:27     ` Gleb Natapov
2009-10-20 13:29       ` Jan Kiszka
2009-10-20 13:32         ` Gleb Natapov
2009-10-20 13:27     ` Alexander Graf
2009-10-20 13:37   ` Jan Kiszka
2009-10-20 13:41     ` Alexander Graf
2009-10-20 13:48       ` Gleb Natapov
2009-10-20 13:51         ` Alexander Graf
2009-10-20 18:55           ` Gleb Natapov
2009-10-20 18:59             ` Alexander Graf
2009-10-20 19:09               ` Gleb Natapov
2009-10-20 19:23                 ` Alexander Graf
2009-10-20 19:31                   ` Gleb Natapov
2009-10-25  9:46                   ` Avi Kivity
2009-10-25 13:53                     ` Alexander Graf
2009-10-25 14:08                       ` Avi Kivity
2009-10-25 16:45                         ` Alexander Graf
2009-10-26  8:33                           ` Avi Kivity
2009-10-26  9:11                             ` Alexander Graf
2009-10-26  9:19                               ` Avi Kivity
2009-10-20 13:35 ` Gleb Natapov
2009-10-20 18:45 ` Marcelo Tosatti
2009-10-23 13:08   ` Jan Kiszka
2009-10-23 17:00     ` Marcelo Tosatti
2009-10-23 19:26       ` Jan Kiszka
2009-10-23 19:34 ` Jan Kiszka
2009-10-24 10:35   ` Alexander Graf
2009-10-25  9:49     ` Avi Kivity
2009-10-26  9:17       ` Joerg Roedel
2009-10-26  9:21         ` Avi Kivity
2009-10-26  9:30           ` Joerg Roedel
2009-10-26  9:39             ` Avi Kivity
2009-10-26  9:56               ` Joerg Roedel
2009-10-26 10:09                 ` Avi Kivity
2009-10-26 10:45                   ` Joerg Roedel
2009-10-26 10:56                     ` Avi Kivity
2009-10-26 11:10                       ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).