kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Heads up: More user-unaccessible x86 states?
       [not found]     ` <4AC88BF2.7080200@redhat.com>
@ 2009-10-04 19:07       ` Jan Kiszka
  2009-10-05  6:18         ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-10-04 19:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

Avi Kivity wrote:
> On 10/04/2009 12:50 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>   
>>> On 10/04/2009 10:59 AM, Jan Kiszka wrote:
>>>     
>>>> Hi,
>>>>
>>>> while preparing new IOCTLs to let user space query&   set the yet
>>>> unaccessible NMI states (pending and masked) I also came across the
>>>> interrupt shadow masks. Unless I missed something I would say that
>>>> we so
>>>> far break them in the rare case that a migration happens right while
>>>> any
>>>> of them is asserted. So I guess I should extend my interface and stuff
>>>> them in as well.
>>>>
>>>> Do we have more of such unaccessible states on x86 that could be
>>>> included, too? Would be a good chance...
>>>>
>>>>        
>>> There's some hidden state in the cpuid mechanism.  I think we expose it
>>> though (just don't use it in qemu).
>>>      
>> Do you have more details on this?
>>    
> 
> Some cpuid leaves return different information based on an internal
> counter.  This is indicated by KVM_CPUID_FLAG_STATEFUL_FUNC.
> 
>> The PDPTRs are hidden state that we should save/restore, though no sane
>> guest relies on them.
>>    A quick glance at SVM makes me think that those registered are not
>> exposed there. So when keeping in mind that we may only help VMX guests,
>> I think i makes even less sense to "fix" this, does it?
>>    
> 
> Yes.  With npt the PDPTRs are essentially gone.
> 
>>> I think we can lose information if we migrate during a SIPI
>>> (sipi_vector), though that might be fixable without exposing it.
>>>      
>> Hmm, I see. But even it it's not fixable, such an extension would be an
>> in-kernel irqchip thing.
>>    
> 
> Yes.
> 
>>> We'll might also lost debug traps.
>>>
>>> We drop pending exceptions; normally that's fine since they'll reinject
>>> themselves, but MCE will not.
>>>      
>> So would it make sense and fix those two issues when we simply save and
>> restore the pending exception?
>>    
> 
> Yes.
> 
> btw, instead of adding a new ioctl, perhaps it makes sense to define a
> new KVM_VCPU_STATE structure that holds all current and future state
> (with generous reserved space), instead of separating state over a dozen
> ioctls.
> 

OK, makes sense. With our without lapic state? How much "future space"?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-04 19:07       ` Heads up: More user-unaccessible x86 states? Jan Kiszka
@ 2009-10-05  6:18         ` Avi Kivity
  2009-10-05  7:43           ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-10-05  6:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel

On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>> new KVM_VCPU_STATE structure that holds all current and future state
>> (with generous reserved space), instead of separating state over a dozen
>> ioctls.
>>
>>      
> OK, makes sense. With our without lapic state?

I'm in two minds.  I'm leaning towards including lapic but would welcome 
arguments either way.

Note we have to be careful with timers such as the tsc and lapic timer.  
Maybe have a bitmask at the front specifying which elements are active.

> How much "future space"?
>    

avx will change the sse registers from 16x16 to 16x32, with a hint of 
more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are 
potentially endless.  Lots of space.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05  6:18         ` Avi Kivity
@ 2009-10-05  7:43           ` Jan Kiszka
  2009-10-05  8:55             ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-10-05  7:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

Avi Kivity wrote:
> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>> new KVM_VCPU_STATE structure that holds all current and future state
>>> (with generous reserved space), instead of separating state over a dozen
>>> ioctls.
>>>
>>>      
>> OK, makes sense. With our without lapic state?
> 
> I'm in two minds.  I'm leaning towards including lapic but would welcome
> arguments either way.

The lapic is optional and, thus, typically handled in different code
modules by user space. QEMU even creates a separate device that holds
the state. I'm not sure user space will benefit from a unified query/set
interface with regard to this.

> 
> Note we have to be careful with timers such as the tsc and lapic timer. 
> Maybe have a bitmask at the front specifying which elements are active.

...and the lapic timers are another argument.

Regarding TSC, which means MSRs: I tend to include only states into the
this meta state which have fixed sizes. Otherwise things will get very
hairy. And the GET/SET_MSRS interface is already fairly flexible, the
question would be again: What can we gain by unifying?

> 
>> How much "future space"?
>>    
> 
> avx will change the sse registers from 16x16 to 16x32, with a hint of
> more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are
> potentially endless.  Lots of space.
> 

Hmm, a some kB then (even without MSRs)...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05  7:43           ` Jan Kiszka
@ 2009-10-05  8:55             ` Avi Kivity
  2009-10-05 11:18               ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-10-05  8:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel

On 10/05/2009 09:43 AM, Jan Kiszka wrote:
> Avi Kivity wrote:
>    
>> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>      
>>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>>> new KVM_VCPU_STATE structure that holds all current and future state
>>>> (with generous reserved space), instead of separating state over a dozen
>>>> ioctls.
>>>>
>>>>
>>>>          
>>> OK, makes sense. With our without lapic state?
>>>        
>> I'm in two minds.  I'm leaning towards including lapic but would welcome
>> arguments either way.
>>      
> The lapic is optional and, thus, typically handled in different code
> modules by user space. QEMU even creates a separate device that holds
> the state.

avx registers, nested vmx are optional as well.

> I'm not sure user space will benefit from a unified query/set
> interface with regard to this.
>    

The main benefit is to avoid creating an ioctl each time we find a 
missing bit.

>> Note we have to be careful with timers such as the tsc and lapic timer.
>> Maybe have a bitmask at the front specifying which elements are active.
>>      
> ...and the lapic timers are another argument.
>
> Regarding TSC, which means MSRs: I tend to include only states into the
> this meta state which have fixed sizes. Otherwise things will get very
> hairy. And the GET/SET_MSRS interface is already fairly flexible, the
> question would be again: What can we gain by unifying?
>    

For MSRs, not much.

Note we can make it work, by storing an offset/length at a fixed 
location and letting userspace point it into the reserved area.

>>> How much "future space"?
>>>
>>>        
>> avx will change the sse registers from 16x16 to 16x32, with a hint of
>> more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are
>> potentially endless.  Lots of space.
>>
>>      
> Hmm, a some kB then (even without MSRs)...
>    

Yup, it's amazing how much state modern processors hold.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05  8:55             ` Avi Kivity
@ 2009-10-05 11:18               ` Jan Kiszka
  2009-10-05 12:05                 ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-10-05 11:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

Avi Kivity wrote:
> On 10/05/2009 09:43 AM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>   
>>> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>>     
>>>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>>>> new KVM_VCPU_STATE structure that holds all current and future state
>>>>> (with generous reserved space), instead of separating state over a
>>>>> dozen
>>>>> ioctls.
>>>>>
>>>>>
>>>>>          
>>>> OK, makes sense. With our without lapic state?
>>>>        
>>> I'm in two minds.  I'm leaning towards including lapic but would welcome
>>> arguments either way.
>>>      
>> The lapic is optional and, thus, typically handled in different code
>> modules by user space. QEMU even creates a separate device that holds
>> the state.
> 
> avx registers, nested vmx are optional as well.
> 
>> I'm not sure user space will benefit from a unified query/set
>> interface with regard to this.
>>    
> 
> The main benefit is to avoid creating an ioctl each time we find a
> missing bit.
> 
>>> Note we have to be careful with timers such as the tsc and lapic timer.
>>> Maybe have a bitmask at the front specifying which elements are active.
>>>      
>> ...and the lapic timers are another argument.
>>
>> Regarding TSC, which means MSRs: I tend to include only states into the
>> this meta state which have fixed sizes. Otherwise things will get very
>> hairy. And the GET/SET_MSRS interface is already fairly flexible, the
>> question would be again: What can we gain by unifying?
>>    
> 
> For MSRs, not much.
> 
> Note we can make it work, by storing an offset/length at a fixed
> location and letting userspace point it into the reserved area.

Hmm, pointers... That makes me think of a meta IOCTL like this:

struct kvm_vcpu_state {
	int substates;
	void __user *substate[0];
};

#define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
#define KVM_VCPU_STATE_SREGS 1
#define KVM_VCPU_STATE_LAPIC 2
...

We could easily extend the call with more substates just by defining new
pointer slots. Moreover, user space could define which substates should
be get/set by simply passing NULL or a valid pointer for substate[n] (or
by limiting the substates field).

The only ugliness I see is the missing type safety as we would have to
deal with void pointers to the substate structures here.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05 11:18               ` Jan Kiszka
@ 2009-10-05 12:05                 ` Avi Kivity
  2009-10-05 12:18                   ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-10-05 12:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel

On 10/05/2009 01:18 PM, Jan Kiszka wrote:
> Avi Kivity wrote:
>    
>> On 10/05/2009 09:43 AM, Jan Kiszka wrote:
>>      
>>> Avi Kivity wrote:
>>>
>>>        
>>>> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>>>>> new KVM_VCPU_STATE structure that holds all current and future state
>>>>>> (with generous reserved space), instead of separating state over a
>>>>>> dozen
>>>>>> ioctls.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> OK, makes sense. With our without lapic state?
>>>>>
>>>>>            
>>>> I'm in two minds.  I'm leaning towards including lapic but would welcome
>>>> arguments either way.
>>>>
>>>>          
>>> The lapic is optional and, thus, typically handled in different code
>>> modules by user space. QEMU even creates a separate device that holds
>>> the state.
>>>        
>> avx registers, nested vmx are optional as well.
>>
>>      
>>> I'm not sure user space will benefit from a unified query/set
>>> interface with regard to this.
>>>
>>>        
>> The main benefit is to avoid creating an ioctl each time we find a
>> missing bit.
>>
>>      
>>>> Note we have to be careful with timers such as the tsc and lapic timer.
>>>> Maybe have a bitmask at the front specifying which elements are active.
>>>>
>>>>          
>>> ...and the lapic timers are another argument.
>>>
>>> Regarding TSC, which means MSRs: I tend to include only states into the
>>> this meta state which have fixed sizes. Otherwise things will get very
>>> hairy. And the GET/SET_MSRS interface is already fairly flexible, the
>>> question would be again: What can we gain by unifying?
>>>
>>>        
>> For MSRs, not much.
>>
>> Note we can make it work, by storing an offset/length at a fixed
>> location and letting userspace point it into the reserved area.
>>      
> Hmm, pointers... That makes me think of a meta IOCTL like this:
>
> struct kvm_vcpu_state {
> 	int substates;
> 	void __user *substate[0];
> };
>
>    

True pointers are no go since we have to deal with compat code (31/32 
bit userspace calling into a 64 bit kernel).  Offsets into the state 
structure are easier.

> #define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
> #define KVM_VCPU_STATE_SREGS 1
> #define KVM_VCPU_STATE_LAPIC 2
> ...
>
> We could easily extend the call with more substates just by defining new
> pointer slots. Moreover, user space could define which substates should
> be get/set by simply passing NULL or a valid pointer for substate[n] (or
> by limiting the substates field).
>
> The only ugliness I see is the missing type safety as we would have to
> deal with void pointers to the substate structures here.
>    

For fixed sized state a feature bitmap is sufficient I think.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05 12:05                 ` Avi Kivity
@ 2009-10-05 12:18                   ` Jan Kiszka
  2009-10-05 12:34                     ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-10-05 12:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

Avi Kivity wrote:
> On 10/05/2009 01:18 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>    
>>> On 10/05/2009 09:43 AM, Jan Kiszka wrote:
>>>      
>>>> Avi Kivity wrote:
>>>>
>>>>        
>>>>> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>>>>
>>>>>          
>>>>>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>>>>>> new KVM_VCPU_STATE structure that holds all current and future state
>>>>>>> (with generous reserved space), instead of separating state over a
>>>>>>> dozen
>>>>>>> ioctls.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> OK, makes sense. With our without lapic state?
>>>>>>
>>>>>>            
>>>>> I'm in two minds.  I'm leaning towards including lapic but would welcome
>>>>> arguments either way.
>>>>>
>>>>>          
>>>> The lapic is optional and, thus, typically handled in different code
>>>> modules by user space. QEMU even creates a separate device that holds
>>>> the state.
>>>>        
>>> avx registers, nested vmx are optional as well.
>>>
>>>      
>>>> I'm not sure user space will benefit from a unified query/set
>>>> interface with regard to this.
>>>>
>>>>        
>>> The main benefit is to avoid creating an ioctl each time we find a
>>> missing bit.
>>>
>>>      
>>>>> Note we have to be careful with timers such as the tsc and lapic timer.
>>>>> Maybe have a bitmask at the front specifying which elements are active.
>>>>>
>>>>>          
>>>> ...and the lapic timers are another argument.
>>>>
>>>> Regarding TSC, which means MSRs: I tend to include only states into the
>>>> this meta state which have fixed sizes. Otherwise things will get very
>>>> hairy. And the GET/SET_MSRS interface is already fairly flexible, the
>>>> question would be again: What can we gain by unifying?
>>>>
>>>>        
>>> For MSRs, not much.
>>>
>>> Note we can make it work, by storing an offset/length at a fixed
>>> location and letting userspace point it into the reserved area.
>>>      
>> Hmm, pointers... That makes me think of a meta IOCTL like this:
>>
>> struct kvm_vcpu_state {
>> 	int substates;
>> 	void __user *substate[0];
>> };
>>
>>    
> 
> True pointers are no go since we have to deal with compat code (31/32 
> bit userspace calling into a 64 bit kernel).  Offsets into the state 
> structure are easier.

So current KVM_GET_DIRTY_LOG is broken in the compat case...

> 
>> #define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
>> #define KVM_VCPU_STATE_SREGS 1
>> #define KVM_VCPU_STATE_LAPIC 2
>> ...
>>
>> We could easily extend the call with more substates just by defining new
>> pointer slots. Moreover, user space could define which substates should
>> be get/set by simply passing NULL or a valid pointer for substate[n] (or
>> by limiting the substates field).
>>
>> The only ugliness I see is the missing type safety as we would have to
>> deal with void pointers to the substate structures here.
>>    
> 
> For fixed sized state a feature bitmap is sufficient I think.
> 

We'll probably have to deal with both. Therefore, I'm looking for a
unified solution.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05 12:18                   ` Jan Kiszka
@ 2009-10-05 12:34                     ` Avi Kivity
  2009-10-05 12:42                       ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-10-05 12:34 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel

On 10/05/2009 02:18 PM, Jan Kiszka wrote:
>
>> True pointers are no go since we have to deal with compat code (31/32
>> bit userspace calling into a 64 bit kernel).  Offsets into the state
>> structure are easier.
>>      
> So current KVM_GET_DIRTY_LOG is broken in the compat case...
>
>    

Yes, for big-endian 32/64 and s390.  There are some patches floating around.

> We'll probably have to deal with both. Therefore, I'm looking for a
> unified solution.
>
>    

array of

struct {
    __u16 type;
    __u16 offset;
} ?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05 12:34                     ` Avi Kivity
@ 2009-10-05 12:42                       ` Jan Kiszka
  2009-10-05 12:55                         ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-10-05 12:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

Avi Kivity wrote:
> On 10/05/2009 02:18 PM, Jan Kiszka wrote:
>>> True pointers are no go since we have to deal with compat code (31/32
>>> bit userspace calling into a 64 bit kernel).  Offsets into the state
>>> structure are easier.
>>>      
>> So current KVM_GET_DIRTY_LOG is broken in the compat case...
>>
>>    
> 
> Yes, for big-endian 32/64 and s390.  There are some patches floating around.

Well, that's for fixing up the endianess of the bitmap itself. But the
problem with void * in compat code are their different sizes. And
GET_DIRTY_LOG solves this via padding:

	union {
		void __user *dirty_bitmap;
		__u64 padding2;
	};

So this should not make pointers a no-go, should it?

> 
>> We'll probably have to deal with both. Therefore, I'm looking for a
>> unified solution.
>>
>>    
> 
> array of
> 
> struct {
>     __u16 type;
>     __u16 offset;
> } ?
> 

For sure possible, just the setup of such data structure in user space
gets a bit, well, unhandy.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Heads up: More user-unaccessible x86 states?
  2009-10-05 12:42                       ` Jan Kiszka
@ 2009-10-05 12:55                         ` Avi Kivity
  0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2009-10-05 12:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm-devel

On 10/05/2009 02:42 PM, Jan Kiszka wrote:
>
>> Yes, for big-endian 32/64 and s390.  There are some patches floating around.
>>      
> Well, that's for fixing up the endianess of the bitmap itself. But the
> problem with void * in compat code are their different sizes. And
> GET_DIRTY_LOG solves this via padding:
>
> 	union {
> 		void __user *dirty_bitmap;
> 		__u64 padding2;
> 	};
>
> So this should not make pointers a no-go, should it?
>    

No, it doesn't work.  Big-endian will place the pointer at offset zero 
which is the high-end word when read by the host; and 31-bit s390 needs 
something unsurprisingly strange to be done to the pointer.

The patches I mentioned involve creating a compat_ioctl callback, 
something which I tried to avoid but failed, not having considered 
big-endian and s390.

>>> We'll probably have to deal with both. Therefore, I'm looking for a
>>> unified solution.
>>>
>>>
>>>        
>> array of
>>
>> struct {
>>      __u16 type;
>>      __u16 offset;
>> } ?
>>
>>      
> For sure possible, just the setup of such data structure in user space
> gets a bit, well, unhandy.
>    

Yes.  Fixed-offset substructures are a lot easier and less error-prone.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-10-05 12:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4AC86404.3090209@web.de>
     [not found] ` <4AC87299.4040508@redhat.com>
     [not found]   ` <4AC87E08.5070908@web.de>
     [not found]     ` <4AC88BF2.7080200@redhat.com>
2009-10-04 19:07       ` Heads up: More user-unaccessible x86 states? Jan Kiszka
2009-10-05  6:18         ` Avi Kivity
2009-10-05  7:43           ` Jan Kiszka
2009-10-05  8:55             ` Avi Kivity
2009-10-05 11:18               ` Jan Kiszka
2009-10-05 12:05                 ` Avi Kivity
2009-10-05 12:18                   ` Jan Kiszka
2009-10-05 12:34                     ` Avi Kivity
2009-10-05 12:42                       ` Jan Kiszka
2009-10-05 12:55                         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).