* KVM call minutes for Nov 30
@ 2010-11-30 15:53 Chris Wright
2010-11-30 15:59 ` Anthony Liguori
2010-12-01 9:27 ` Nadav Har'El
0 siblings, 2 replies; 8+ messages in thread
From: Chris Wright @ 2010-11-30 15:53 UTC (permalink / raw)
To: kvm; +Cc: qemu-devel
2011 KVM Conference
- together with LF event like LinuxCon Vancouver BC (Aug), KS Prague (Nov)
- wider audience
- include qemu (tcg)
- include libvirt
- include xen
0.14.0 release plan
- could push things out, mainly want to keep on track for
infrastructure changes (irc channel migration, git tree migration)
- savannah down
- git.qemu.org was mirror, will start pushing there
- when savannah is back up, will become mirror (so git users should
still work)
- plan on moving #qemu to OFTC
nested VMX
- no progress, future plans are unclear
qemu users forum in grenoble
- worth having someone there
- goal to get embedded forks to push changes back to qemu
migration with large memory
- switching to 50ms cap likely to cause regression in terms of vcpu runtime
- 50ms qemu mutex contention, brief period of mutex access
- this has the effect of speeding up migration but giving too little vcpu
access to qemu mutex (network connections could terminate, for example)
- only fixes to this are to use bw limit or not holding qemu mutex during
mirgration
- run Anthony's test load and discuss on list
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: KVM call minutes for Nov 30 2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright @ 2010-11-30 15:59 ` Anthony Liguori 2010-12-01 9:27 ` Nadav Har'El 1 sibling, 0 replies; 8+ messages in thread From: Anthony Liguori @ 2010-11-30 15:59 UTC (permalink / raw) To: Chris Wright; +Cc: kvm, qemu-devel On 11/30/2010 09:53 AM, Chris Wright wrote: > 2011 KVM Conference > - together with LF event like LinuxCon Vancouver BC (Aug), KS Prague (Nov) > - wider audience > - include qemu (tcg) > - include libvirt > - include xen > > 0.14.0 release plan > - could push things out, mainly want to keep on track for > > infrastructure changes (irc channel migration, git tree migration) > - savannah down > - git.qemu.org was mirror, will start pushing there > - when savannah is back up, will become mirror (so git users should > still work) > - plan on moving #qemu to OFTC > > nested VMX > - no progress, future plans are unclear > > qemu users forum in grenoble > - worth having someone there > - goal to get embedded forks to push changes back to qemu > > migration with large memory > - switching to 50ms cap likely to cause regression in terms of vcpu runtime > - 50ms qemu mutex contention, brief period of mutex access > - this has the effect of speeding up migration but giving too little vcpu > access to qemu mutex (network connections could terminate, for example) > - only fixes to this are to use bw limit or not holding qemu mutex during > mirgration > Right, to restate this, for some workloads, a VCPU needs to access qemu_mutex potentially for the majority of it's execution. If we're letting migration hold the mutex for 95% of the time even if we spread the remaining 5% out for every 50ms, while we avoid having large "stalls", it's only superficial. We're still breaking the migration downtime contract. The only solution is to limit the time migration is allowed to run which is effectively what bandwidth limiting does. I'd be willing to entertain a bandwidth limit expressed in terms of % CPU although I think that's going to be a lot harder to compute than the current bandwidth limit. And while setting a migration limit does increase migration, it's the only solution that preserves fairness unless we stick migration into a separate thread. Regards, Anthony Liguori > - run Anthony's test load and discuss on list > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright 2010-11-30 15:59 ` Anthony Liguori @ 2010-12-01 9:27 ` Nadav Har'El 2010-12-01 10:28 ` Avi Kivity 2010-12-06 19:39 ` Nadav Har'El 1 sibling, 2 replies; 8+ messages in thread From: Nadav Har'El @ 2010-12-01 9:27 UTC (permalink / raw) To: Chris Wright; +Cc: kvm Hi, On Tue, Nov 30, 2010, Chris Wright wrote about "KVM call minutes for Nov 30": > nested VMX > - no progress, future plans are unclear Avi Kivity's request to discuss this issue came around an hour before the call, and I missed it, so I wasn't on the call. Sorry. I'm the only one doing any coding in nested VMX, so I suggest that next time that you want to talk about it you check that I'll be there ;-) Or even better, just use the mailing list to discuss. But more to the point, what I've been doing recently is implementing Avi's and Gleb's review comments, which I'm taking seriously, but often (unsuprisingly) turn out to be a case of "easier said than done". Specifically, in the last review I was asked to make sure that shadow-on-EPT works so that users do not need to remember to add the "ept=0" module option on L0. Unfortunately, while this should have been relatively simple (and it DID work in some time in the distant past), there appears to be a bug that I have spent the last couple of weeks chasing - so far unsucessfully. Regarding future plans: I really want to get nested VMX into KVM, and I'm already doing whatever I can to make this a reality. But unfortunately, I am not yet a seasoned KVM or VMX expert (I'm trying to become one...), and it wasn't I who wrote the original nested VMX code, so every new issue and every new feature that I am asked to fix is new to me, and takes me considerable time to learn, debug, and fix. I hope to continue working on nested VMX, next year as well, but IBM's (my employer's) plans for next year are not yet set in stone. In any case, I firmly believe that the nested VMX feature will be better, and be available more quickly, if I am not the only person working on it. I will be very happy if somebody reading this wants to work on this with me. This is the main reason why I wanted to put nested VMX in the main tree (even before it is 100% bug-free and feature-complete), because that would make it much easier for more people to try this feature, and hopefully to help me fix problems which bother them. Nadav. -- Nadav Har'El | Wednesday, Dec 1 2010, 24 Kislev 5771 nyh@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |"A mathematician is a device for turning http://nadav.harel.org.il |coffee into theorems" -- P. Erdos ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-12-01 9:27 ` Nadav Har'El @ 2010-12-01 10:28 ` Avi Kivity 2010-12-06 19:39 ` Nadav Har'El 1 sibling, 0 replies; 8+ messages in thread From: Avi Kivity @ 2010-12-01 10:28 UTC (permalink / raw) To: Nadav Har'El; +Cc: Chris Wright, kvm On 12/01/2010 11:27 AM, Nadav Har'El wrote: > I really want to get nested VMX into KVM, and I'm already doing whatever I > can to make this a reality. But unfortunately, I am not yet a seasoned KVM or > VMX expert (I'm trying to become one...), and it wasn't I who wrote the > original nested VMX code, so every new issue and every new feature that I am > asked to fix is new to me, and takes me considerable time to learn, debug, > and fix. > > I hope to continue working on nested VMX, next year as well, but IBM's (my > employer's) plans for next year are not yet set in stone. In any case, I > firmly believe that the nested VMX feature will be better, and be available > more quickly, if I am not the only person working on it. I will be very happy > if somebody reading this wants to work on this with me. This is the main > reason why I wanted to put nested VMX in the main tree (even before it is > 100% bug-free and feature-complete), because that would make it much easier > for more people to try this feature, and hopefully to help me fix problems > which bother them. I am reluctant to lower the bar on entry. Things like SMP and host EPT support are really basic IMO. Good integration into KVM is also important. If you would like to get more people participating, I suggest publishing a git tree. People can then post patches against the tree, which you can merge into the relevant commits. It would also get more testing done by interested parties. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-12-01 9:27 ` Nadav Har'El 2010-12-01 10:28 ` Avi Kivity @ 2010-12-06 19:39 ` Nadav Har'El 2010-12-07 8:34 ` Avi Kivity 1 sibling, 1 reply; 8+ messages in thread From: Nadav Har'El @ 2010-12-06 19:39 UTC (permalink / raw) To: Chris Wright; +Cc: kvm, avi On Wed, Dec 01, 2010, Nadav Har'El wrote about "Re: KVM call minutes for Nov 30": > Specifically, in the last review I was asked to make sure that shadow-on-EPT > works so that users do not need to remember to add the "ept=0" module option > on L0. Unfortunately, while this should have been relatively simple (and it > DID work in some time in the distant past), there appears to be a bug that I > have spent the last couple of weeks chasing - so far unsucessfully. I was finally able to track this bug down. The issue was incorrect setup of the four PDPTE (which are called PDPTR in KVM) fields in vmcs02. These fields are important to set up correctly when using EPT and PAE. For some reason I have yet to understand, KVM (as an L1) appears to be setting the PAE bit in its guest. The previous code copied these fields from vmcs01 to vmcs02, but this is incorrect because these fields need to be recalculated for each cr3, and the GUEST_CR3 used for running L2 (the shadow page table set up by L1) is different from the GUEST_CR3 used to run L1 (this one is defined by L1, and untouched by L0 because L0 uses EPT). We need to emulate what the processor does on a cr3 change when EPT and PAE are both enabled - i.e., dereference the cr3 value (this requires an EPT translation) and find the four pointers to be saved in the PDPTR fields. I have done this, and the shadow-on-ept case finally works, and there is no nead to use ept=0 on L0 any more :-) I'm curious, though, why does KVM set PAE for its guest? What does setting PAE supposed to do while the guest is booting (and thinks it is running in real mode)? What does setting PAE supposed to do while the guest is running in long mode? -- Nadav Har'El | Monday, Dec 6 2010, 30 Kislev 5771 nyh@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |Amateurs built the ark - professionals http://nadav.harel.org.il |built the Titanic. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-12-06 19:39 ` Nadav Har'El @ 2010-12-07 8:34 ` Avi Kivity 2010-12-07 12:49 ` Nadav Har'El 0 siblings, 1 reply; 8+ messages in thread From: Avi Kivity @ 2010-12-07 8:34 UTC (permalink / raw) To: Nadav Har'El; +Cc: Chris Wright, kvm On 12/06/2010 09:39 PM, Nadav Har'El wrote: > On Wed, Dec 01, 2010, Nadav Har'El wrote about "Re: KVM call minutes for Nov 30": > > Specifically, in the last review I was asked to make sure that shadow-on-EPT > > works so that users do not need to remember to add the "ept=0" module option > > on L0. Unfortunately, while this should have been relatively simple (and it > > DID work in some time in the distant past), there appears to be a bug that I > > have spent the last couple of weeks chasing - so far unsucessfully. > > I was finally able to track this bug down. The issue was incorrect setup of > the four PDPTE (which are called PDPTR in KVM) fields in vmcs02. > These fields are important to set up correctly when using EPT and PAE. > For some reason I have yet to understand, KVM (as an L1) appears to be > setting the PAE bit in its guest. The previous code copied these fields from > vmcs01 to vmcs02, but this is incorrect because these fields need to be > recalculated for each cr3, and the GUEST_CR3 used for running L2 (the shadow > page table set up by L1) is different from the GUEST_CR3 used to run L1 > (this one is defined by L1, and untouched by L0 because L0 uses EPT). > We need to emulate what the processor does on a cr3 change when EPT and > PAE are both enabled - i.e., dereference the cr3 value (this requires an EPT > translation) and find the four pointers to be saved in the PDPTR fields. > I have done this, and the shadow-on-ept case finally works, and there is > no nead to use ept=0 on L0 any more :-) Great. I imagine the fixed code is also simpler. I don't follow what you mean by "this requires an EPT translation". All it requires is a kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is enabled. You may need to order the loading of CR0, CR3, CR4, and EFER to achieve the desired effect. > I'm curious, though, why does KVM set PAE for its guest? What does setting > PAE supposed to do while the guest is booting (and thinks it is running in > real mode)? PAE is needed to access >4G of memory. Otherwise the PTEs are 32 bits long and cannot reference all of host memory. > What does setting PAE supposed to do while the guest is running > in long mode? PAE is a required by the processor for long mode (independently of virtualization). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-12-07 8:34 ` Avi Kivity @ 2010-12-07 12:49 ` Nadav Har'El 2010-12-07 13:10 ` Avi Kivity 0 siblings, 1 reply; 8+ messages in thread From: Nadav Har'El @ 2010-12-07 12:49 UTC (permalink / raw) To: Avi Kivity; +Cc: Chris Wright, kvm On Tue, Dec 07, 2010, Avi Kivity wrote about "Re: KVM call minutes for Nov 30": >... > All it requires is a > kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is > enabled. You may need to order the loading of CR0, CR3, CR4, and EFER > to achieve the desired effect. I did this more explicitly as: vmcs_writel(GUEST_CR3, get_vmcs12_fields(vcpu)->guest_cr3); vcpu->arch.cr3 = get_vmcs12_fields(vcpu)->guest_cr3; load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3); vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]); vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]); vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]); vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]); I'm still working on trying to simplify this code - I'll indeed try to see if I can use kvm_set_cr3 instead. Thanks for the suggestion. However, Even if it works, I have a concern on how nested vmx might brake in the future if kvm_set_cr3 is changed in some way that is irrelevant to nested. Nadav. -- Nadav Har'El | Tuesday, Dec 7 2010, 30 Kislev 5771 nyh@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if we knew http://nadav.harel.org.il |how to make AOL's Free CD's edible! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: KVM call minutes for Nov 30 2010-12-07 12:49 ` Nadav Har'El @ 2010-12-07 13:10 ` Avi Kivity 0 siblings, 0 replies; 8+ messages in thread From: Avi Kivity @ 2010-12-07 13:10 UTC (permalink / raw) To: Nadav Har'El; +Cc: Chris Wright, kvm On 12/07/2010 02:49 PM, Nadav Har'El wrote: > On Tue, Dec 07, 2010, Avi Kivity wrote about "Re: KVM call minutes for Nov 30": > >... > > All it requires is a > > kvm_set_cr3() which will load the PDPTEs into the PDPTRs if PAE is > > enabled. You may need to order the loading of CR0, CR3, CR4, and EFER > > to achieve the desired effect. > > I did this more explicitly as: > > vmcs_writel(GUEST_CR3, get_vmcs12_fields(vcpu)->guest_cr3); > vcpu->arch.cr3 = get_vmcs12_fields(vcpu)->guest_cr3; > load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3); > vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]); > vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]); > vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]); > vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]); > > I'm still working on trying to simplify this code - I'll indeed try to see if > I can use kvm_set_cr3 instead. Thanks for the suggestion. > However, Even if it works, I have a concern on how nested vmx might brake in > the future if kvm_set_cr3 is changed in some way that is irrelevant to nested. It's more correct to use kvm_set_cr3(), since that accounts for all side effects. For example unsynchronized shadow mmu pages need to be synced. If there are new side effects that we don't want in nesting, then we'll add a flag to avoid them. Also need to do the same on the vmexit path (kvm_set_cr3(HOST_CR3)). This is what svm does (though only for !npt; but it should also work unconditionally). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-07 13:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-30 15:53 KVM call minutes for Nov 30 Chris Wright 2010-11-30 15:59 ` Anthony Liguori 2010-12-01 9:27 ` Nadav Har'El 2010-12-01 10:28 ` Avi Kivity 2010-12-06 19:39 ` Nadav Har'El 2010-12-07 8:34 ` Avi Kivity 2010-12-07 12:49 ` Nadav Har'El 2010-12-07 13:10 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox