public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* KVM call minutes for Nov 30
@ 2010-11-30 15:53 Chris Wright
  2010-11-30 15:59 ` Anthony Liguori
  2010-12-01  9:27 ` Nadav Har'El
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Wright @ 2010-11-30 15:53 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

2011 KVM Conference
- together with LF event like LinuxCon Vancouver BC (Aug), KS Prague (Nov)
- wider audience
  - include qemu (tcg)
  - include libvirt
  - include xen

0.14.0 release plan
- could push things out, mainly want to keep on track for

infrastructure changes (irc channel migration, git tree migration)
- savannah down
- git.qemu.org was mirror, will start pushing there
- when savannah is back up, will become mirror (so git users should
  still work)
- plan on moving #qemu to OFTC

nested VMX
- no progress, future plans are unclear

qemu users forum in grenoble
- worth having someone there
- goal to get embedded forks to push changes back to qemu

migration with large memory
- switching to 50ms cap likely to cause regression in terms of vcpu runtime
- 50ms qemu mutex contention, brief period of mutex access
  - this has the effect of speeding up migration but giving too little vcpu
    access to qemu mutex (network connections could terminate, for example)
- only fixes to this are to use bw limit or not holding qemu mutex during
  mirgration
- run Anthony's test load and discuss on list

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright
@ 2010-11-30 15:59 ` Anthony Liguori
  2010-12-01  9:27 ` Nadav Har'El
  1 sibling, 0 replies; 8+ messages in thread
From: Anthony Liguori @ 2010-11-30 15:59 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, qemu-devel

On 11/30/2010 09:53 AM, Chris Wright wrote:
> 2011 KVM Conference
> - together with LF event like LinuxCon Vancouver BC (Aug), KS Prague (Nov)
> - wider audience
>    - include qemu (tcg)
>    - include libvirt
>    - include xen
>
> 0.14.0 release plan
> - could push things out, mainly want to keep on track for
>
> infrastructure changes (irc channel migration, git tree migration)
> - savannah down
> - git.qemu.org was mirror, will start pushing there
> - when savannah is back up, will become mirror (so git users should
>    still work)
> - plan on moving #qemu to OFTC
>
> nested VMX
> - no progress, future plans are unclear
>
> qemu users forum in grenoble
> - worth having someone there
> - goal to get embedded forks to push changes back to qemu
>
> migration with large memory
> - switching to 50ms cap likely to cause regression in terms of vcpu runtime
> - 50ms qemu mutex contention, brief period of mutex access
>    - this has the effect of speeding up migration but giving too little vcpu
>      access to qemu mutex (network connections could terminate, for example)
> - only fixes to this are to use bw limit or not holding qemu mutex during
>    mirgration
>    

Right, to restate this, for some workloads, a VCPU needs to access 
qemu_mutex potentially for the majority of it's execution.  If we're 
letting migration hold the mutex for 95% of the time even if we spread 
the remaining 5% out for every 50ms, while we avoid having large 
"stalls", it's only superficial.  We're still breaking the migration 
downtime contract.

The only solution is to limit the time migration is allowed to run which 
is effectively what bandwidth limiting does.  I'd be willing to 
entertain a bandwidth limit expressed in terms of % CPU although I think 
that's going to be a lot harder to compute than the current bandwidth limit.

And while setting a migration limit does increase migration, it's the 
only solution that preserves fairness unless we stick migration into a 
separate thread.

Regards,

Anthony Liguori

> - run Anthony's test load and discuss on list
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright
  2010-11-30 15:59 ` Anthony Liguori
@ 2010-12-01  9:27 ` Nadav Har'El
  2010-12-01 10:28   ` Avi Kivity
  2010-12-06 19:39   ` Nadav Har'El
  1 sibling, 2 replies; 8+ messages in thread
From: Nadav Har'El @ 2010-12-01  9:27 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm

Hi,

On Tue, Nov 30, 2010, Chris Wright wrote about "KVM call minutes for Nov 30":
> nested VMX
> - no progress, future plans are unclear

Avi Kivity's request to discuss this issue came around an hour before the
call, and I missed it, so I wasn't on the call. Sorry.
I'm the only one doing any coding in nested VMX, so I suggest that next time
that you want to talk about it you check that I'll be there ;-) Or even
better, just use the mailing list to discuss.

But more to the point, what I've been doing recently is implementing Avi's
and Gleb's review comments, which I'm taking seriously, but often
(unsuprisingly) turn out to be a case of "easier said than done".

Specifically, in the last review I was asked to make sure that shadow-on-EPT
works so that users do not need to remember to add the "ept=0" module option
on L0. Unfortunately, while this should have been relatively simple (and it
DID work in some time in the distant past), there appears to be a bug that I
have spent the last couple of weeks chasing - so far unsucessfully.

Regarding future plans:

I really want to get nested VMX into KVM, and I'm already doing whatever I
can to make this a reality. But unfortunately, I am not yet a seasoned KVM or
VMX expert (I'm trying to become one...), and it wasn't I who wrote the
original nested VMX code, so every new issue and every new feature that I am
asked to fix is new to me, and takes me considerable time to learn, debug,
and fix.

I hope to continue working on nested VMX, next year as well, but IBM's (my
employer's) plans for next year are not yet set in stone. In any case, I
firmly believe that the nested VMX feature will be better, and be available
more quickly, if I am not the only person working on it. I will be very happy
if somebody reading this wants to work on this with me. This is the main
reason why I wanted to put nested VMX in the main tree (even before it is
100% bug-free and feature-complete), because that would make it much easier
for more people to try this feature, and hopefully to help me fix problems
which bother them.

Nadav.

-- 
Nadav Har'El                        |   Wednesday, Dec  1 2010, 24 Kislev 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |"A mathematician is a device for turning
http://nadav.harel.org.il           |coffee into theorems" -- P. Erdos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-12-01  9:27 ` Nadav Har'El
@ 2010-12-01 10:28   ` Avi Kivity
  2010-12-06 19:39   ` Nadav Har'El
  1 sibling, 0 replies; 8+ messages in thread
From: Avi Kivity @ 2010-12-01 10:28 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm

On 12/01/2010 11:27 AM, Nadav Har'El wrote:
> I really want to get nested VMX into KVM, and I'm already doing whatever I
> can to make this a reality. But unfortunately, I am not yet a seasoned KVM or
> VMX expert (I'm trying to become one...), and it wasn't I who wrote the
> original nested VMX code, so every new issue and every new feature that I am
> asked to fix is new to me, and takes me considerable time to learn, debug,
> and fix.
>
> I hope to continue working on nested VMX, next year as well, but IBM's (my
> employer's) plans for next year are not yet set in stone. In any case, I
> firmly believe that the nested VMX feature will be better, and be available
> more quickly, if I am not the only person working on it. I will be very happy
> if somebody reading this wants to work on this with me. This is the main
> reason why I wanted to put nested VMX in the main tree (even before it is
> 100% bug-free and feature-complete), because that would make it much easier
> for more people to try this feature, and hopefully to help me fix problems
> which bother them.

I am reluctant to lower the bar on entry.  Things like SMP and host EPT 
support are really basic IMO.  Good integration into KVM is also important.

If you would like to get more people participating, I suggest publishing 
a git tree.  People can then post patches against the tree, which you 
can merge into the relevant commits.  It would also get more testing 
done by interested parties.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-12-01  9:27 ` Nadav Har'El
  2010-12-01 10:28   ` Avi Kivity
@ 2010-12-06 19:39   ` Nadav Har'El
  2010-12-07  8:34     ` Avi Kivity
  1 sibling, 1 reply; 8+ messages in thread
From: Nadav Har'El @ 2010-12-06 19:39 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, avi

On Wed, Dec 01, 2010, Nadav Har'El wrote about "Re: KVM call minutes for Nov 30":
> Specifically, in the last review I was asked to make sure that shadow-on-EPT
> works so that users do not need to remember to add the "ept=0" module option
> on L0. Unfortunately, while this should have been relatively simple (and it
> DID work in some time in the distant past), there appears to be a bug that I
> have spent the last couple of weeks chasing - so far unsucessfully.

I was finally able to track this bug down. The issue was incorrect setup of
the four PDPTE (which are called PDPTR in KVM) fields in vmcs02.
These fields are important to set up correctly when using EPT and PAE.
For some reason I have yet to understand, KVM (as an L1) appears to be
setting the PAE bit in its guest. The previous code copied these fields from
vmcs01 to vmcs02, but this is incorrect because these fields need to be
recalculated for each cr3, and the GUEST_CR3 used for running L2 (the shadow
page table set up by L1) is different from the GUEST_CR3 used to run L1
(this one is defined by L1, and untouched by L0 because L0 uses EPT).
We need to emulate what the processor does on a cr3 change when EPT and
PAE are both enabled - i.e., dereference the cr3 value (this requires an EPT
translation) and find the four pointers to be saved in the PDPTR fields.
I have done this, and the shadow-on-ept case finally works, and there is
no nead to use ept=0 on L0 any more :-)

I'm curious, though, why does KVM set PAE for its guest? What does setting
PAE supposed to do while the guest is booting (and thinks it is running in
real mode)? What does setting PAE supposed to do while the guest is running
in long mode?

-- 
Nadav Har'El                        |      Monday, Dec  6 2010, 30 Kislev 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Amateurs built the ark - professionals
http://nadav.harel.org.il           |built the Titanic.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-12-06 19:39   ` Nadav Har'El
@ 2010-12-07  8:34     ` Avi Kivity
  2010-12-07 12:49       ` Nadav Har'El
  0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2010-12-07  8:34 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm

On 12/06/2010 09:39 PM, Nadav Har'El wrote:
> On Wed, Dec 01, 2010, Nadav Har'El wrote about "Re: KVM call minutes for Nov 30":
> >  Specifically, in the last review I was asked to make sure that shadow-on-EPT
> >  works so that users do not need to remember to add the "ept=0" module option
> >  on L0. Unfortunately, while this should have been relatively simple (and it
> >  DID work in some time in the distant past), there appears to be a bug that I
> >  have spent the last couple of weeks chasing - so far unsucessfully.
>
> I was finally able to track this bug down. The issue was incorrect setup of
> the four PDPTE (which are called PDPTR in KVM) fields in vmcs02.
> These fields are important to set up correctly when using EPT and PAE.
> For some reason I have yet to understand, KVM (as an L1) appears to be
> setting the PAE bit in its guest. The previous code copied these fields from
> vmcs01 to vmcs02, but this is incorrect because these fields need to be
> recalculated for each cr3, and the GUEST_CR3 used for running L2 (the shadow
> page table set up by L1) is different from the GUEST_CR3 used to run L1
> (this one is defined by L1, and untouched by L0 because L0 uses EPT).
> We need to emulate what the processor does on a cr3 change when EPT and
> PAE are both enabled - i.e., dereference the cr3 value (this requires an EPT
> translation) and find the four pointers to be saved in the PDPTR fields.
> I have done this, and the shadow-on-ept case finally works, and there is
> no nead to use ept=0 on L0 any more :-)

Great.  I imagine the fixed code is also simpler.  I don't follow what 
you mean by "this requires an EPT translation".  All it requires is a 
kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is 
enabled.  You may need to order the loading of CR0, CR3, CR4, and EFER 
to achieve the desired effect.

> I'm curious, though, why does KVM set PAE for its guest? What does setting
> PAE supposed to do while the guest is booting (and thinks it is running in
> real mode)?

PAE is needed to access >4G of memory.  Otherwise the PTEs are 32 bits 
long and cannot reference all of host memory.

> What does setting PAE supposed to do while the guest is running
> in long mode?

PAE is a required by the processor for long mode (independently of 
virtualization).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-12-07  8:34     ` Avi Kivity
@ 2010-12-07 12:49       ` Nadav Har'El
  2010-12-07 13:10         ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Nadav Har'El @ 2010-12-07 12:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, kvm

On Tue, Dec 07, 2010, Avi Kivity wrote about "Re: KVM call minutes for Nov 30":
>...
> All it requires is a 
> kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is 
> enabled.  You may need to order the loading of CR0, CR3, CR4, and EFER 
> to achieve the desired effect.

I did this more explicitly as:

	vmcs_writel(GUEST_CR3, get_vmcs12_fields(vcpu)->guest_cr3);
	vcpu->arch.cr3 = get_vmcs12_fields(vcpu)->guest_cr3;
	load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
	vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]);
	vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]);
	vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]);
	vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]);

I'm still working on trying to simplify this code - I'll indeed try to see if
I can use kvm_set_cr3 instead. Thanks for the suggestion.
However, Even if it works, I have a concern on how nested vmx might brake in
the future if kvm_set_cr3 is changed in some way that is irrelevant to nested.

Nadav.

-- 
Nadav Har'El                        |     Tuesday, Dec  7 2010, 30 Kislev 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if we knew
http://nadav.harel.org.il           |how to make AOL's Free CD's edible!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM call minutes for Nov 30
  2010-12-07 12:49       ` Nadav Har'El
@ 2010-12-07 13:10         ` Avi Kivity
  0 siblings, 0 replies; 8+ messages in thread
From: Avi Kivity @ 2010-12-07 13:10 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm

On 12/07/2010 02:49 PM, Nadav Har'El wrote:
> On Tue, Dec 07, 2010, Avi Kivity wrote about "Re: KVM call minutes for Nov 30":
> >...
> >  All it requires is a
> >  kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is
> >  enabled.  You may need to order the loading of CR0, CR3, CR4, and EFER
> >  to achieve the desired effect.
>
> I did this more explicitly as:
>
> 	vmcs_writel(GUEST_CR3, get_vmcs12_fields(vcpu)->guest_cr3);
> 	vcpu->arch.cr3 = get_vmcs12_fields(vcpu)->guest_cr3;
> 	load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
> 	vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]);
> 	vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]);
> 	vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]);
> 	vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]);
>
> I'm still working on trying to simplify this code - I'll indeed try to see if
> I can use kvm_set_cr3 instead. Thanks for the suggestion.
> However, Even if it works, I have a concern on how nested vmx might brake in
> the future if kvm_set_cr3 is changed in some way that is irrelevant to nested.

It's more correct to use kvm_set_cr3(), since that accounts for all side 
effects.  For example unsynchronized shadow mmu pages need to be 
synced.  If there are new side effects that we don't want in nesting, 
then we'll add a flag to avoid them.

Also need to do the same on the vmexit path (kvm_set_cr3(HOST_CR3)).

This is what svm does (though only for !npt; but it should also work 
unconditionally).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-12-07 13:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright
2010-11-30 15:59 ` Anthony Liguori
2010-12-01  9:27 ` Nadav Har'El
2010-12-01 10:28   ` Avi Kivity
2010-12-06 19:39   ` Nadav Har'El
2010-12-07  8:34     ` Avi Kivity
2010-12-07 12:49       ` Nadav Har'El
2010-12-07 13:10         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox