On 11/12/2011 11:37 PM, Nadav Har'El wrote: > On Sat, Nov 12, 2011, Avi Kivity wrote about "Re: [PATCH 02/10] nEPT: MMU context for nested EPT": >> host may write-protect a page. Second, the shadow and guest ptes may be >> in different formats (ept vs ia32). > > I'm afraid I've lost you here... The shadow table and the to-be-shadowed > table are both ia32 (this is the normal shadow table code), or both ept > (the nested tdp code). When are they supposed to be in different > formats (ept vs ia32)? > > I'm also puzzled in what situation will the host will write-protect an EPT02 > (shadow EPT) page? > >> In fact that happens to accidentally work, no? Intermediate ptes are >> always present/write/user, which translates to read/write/execute in EPT. > > It didn't work because it also used to set the "accessed" bit, bit 5, > which on EPT is reserved and caused EPT misconfiguration. So I had to > fix link_shadow_page, or nested EPT would not work at all. > >> Don't optimize for least changes, optimize for best result afterwards. > > As I'm sure you remember, two years ago, in September 6 2009, you wrote in > your blog about the newly contributed nested VMX patch set, and in > particular its nested EPT (which predated the nested NPT contribution). > > Nested EPT was, for some workloads, a huge performance improvement, but > you (if I understand correctly) did not want that code in KVM because > it, basically, optimized for getting the job done, in the most correct > and most efficient manner - but without regard of how cleanly this fit with > other types of shadowing (normal shadow page tables, and nested NPT), > or how much of the code was being duplicated or circumvented. > > So this time around, I couldn't really "not optimize for least changes". > This time, the nested EPT had to fit (like a square peg in a round hole > ;-)), into the preexisting MMU and NPT shadowing. I couldn't really just write > the most correct and most efficient code (which Orit Wasserman already > did, two years earlier). This time I needed to figure out the least obtrusive > way of changing the existing code. The hardest thing about doing this > was trying to understand all the complexities and subtleties of the existing > MMU code in KVM, which already does 101 different cases in one > overloaded piece of code, which is not commented or documented. > And of course, add to that all the complexities (some might even say "cruft") > which the underlying x86 architecture itself has acrued over the years. > So it's not surprising I've missed some of the important subtleties which > didn't have any effect in the typical case I've tried. Like I said, in my > tests nested EPT *did* work. And even getting to that point was hard enough :-) > >> We need a third variant of walk_addr_generic that parses EPT format >> PTEs. Whether that's best done by writing paging_ept.h or modifying >> paging_tmpl.h, I don't know. > > Thanks. I'll think about everything you've said in this thread (I'm still > not convinced I understood all your points, so just understanding them > will be the first step). I'll see what I can do to improve the patch. > > But I have to be honest - I'm not sure how quickly I can finish this. > I really appreciate all your comments about nested VMX in the last two > years - most of them have been spot-on, 100% correct, and really helpful > for making me understand things which I had previously misunderstood. > However, since you are (of course) extremely familiar with every nook and > cranny of KVM, what normally happens is that every comment which took you > 5 minutes to figure out, takes me 5 days to fully understand, and to actually > write, debug and test the fixed code. Every review that takes you two days > to go through (and is very much appreciated!) takes me several months to fix > each and every thing you asked for. > > Don't get me wrong, I *am* planning to continue working (part-time) on nested > VMX, and nested EPT in particular. But if you want it to pick up the pace, > I could use some help with actual coding from people who have much more > intimate knowledge of the non-nested-VMX parts of KVM than I have. > > In the meantime, if anybody wants to experiment with a much faster > Nested VMX than we had before, you can try my current patch. It may not > be perfect, but in many ways it is better than the old shadow-on-ept code. > And in simple (64 bit, 4k page) kvm-over-kvm configurations like I tried, it > works well. > > Nadav. > Maybe this patch can help, this is roughly what Avi wants (I hope) done very quickly. I'm sorry I don't have setup to run nested VMX at the moment so i can't test it. Orit