KVM call minutes for Sept 21

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM call minutes for Sept 21
@ 2010-09-21 18:05 Chris Wright
  2010-09-21 18:23 ` Anthony Liguori
  2010-09-22  0:04 ` Nadav Har'El
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Wright @ 2010-09-21 18:05 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

Nested VMX
- looking for forward progress and better collaboration between the
  Intel and IBM teams
- needs more review (not a new issue)
- use cases
- work todo
  - merge baseline patch
    - looks pretty good
    - review is finding mostly small things at this point
    - need some correctness verification (both review from Intel and testing)
  - need a test suite
    - test suite harness will help here
      - a few dozen nested SVM tests are there, can follow for nested VMX
  - nested EPT
  - optimize (reduce vmreads and vmwrites)
- has long term maintan

Hotplug
- command...guest may or may not respond
- guest can't be trusted to be direct part of request/response loop
- solve at QMP level
- human monitor issues (multiple successive commands to complete a
  single unplug)
  - should be a GUI interface design decision, human monitor is not a
    good design point
    - digression into GUI interface

Drive caching
- need to formalize the meanings in terms of data integrity guarantees
- guest write cache (does it directly reflect the host write cache?)
  - live migration, underlying block dev changes, so need to decouple the two
- O_DIRECT + O_DSYNC
  - O_DSYNC needed based on whether disk cache is available
  - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
  - how to manage w/out needing to flush every write, slow
- perhaps start with O_DIRECT on raw, non-sparse files only?
- backend needs to open backing store matching to guests disk cache state
- O_DIRECT itself has inconsistent integrity guarantees
  - works well with fully allocated file, depedent on disk cache disable
    (or fs specific flushing)
- filesystem specific warnings (ext4 w/ barriers on, brtfs)
- need to be able to open w/ O_DSYNC depending on guets's write cache mode
- make write cache visible to guest (need a knob for this)
- qemu default is cache=writethrough, do we need to revisit that?
- just present user with option whether or not to use host page cache
- allow guest OS to choose disk write cache setting
  - set up host backend accordingly
- be nice preserve write cache settings over boot (outgrowing cmos storage)
- maybe some host fs-level optimization possible
  - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
- conclusion
  - one direct user tunable, "use host page cache or not"
  - one guest OS tunable, "enable disk cache"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-21 18:05 KVM call minutes for Sept 21 Chris Wright
@ 2010-09-21 18:23 ` Anthony Liguori
  2010-09-22  0:04 ` Nadav Har'El
  1 sibling, 0 replies; 22+ messages in thread
From: Anthony Liguori @ 2010-09-21 18:23 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, qemu-devel

On 09/21/2010 01:05 PM, Chris Wright wrote:
> Nested VMX
> - looking for forward progress and better collaboration between the
>    Intel and IBM teams
> - needs more review (not a new issue)
> - use cases
> - work todo
>    - merge baseline patch
>      - looks pretty good
>      - review is finding mostly small things at this point
>      - need some correctness verification (both review from Intel and testing)
>    - need a test suite
>      - test suite harness will help here
>        - a few dozen nested SVM tests are there, can follow for nested VMX
>    - nested EPT
>    - optimize (reduce vmreads and vmwrites)
> - has long term maintan
>
> Hotplug
> - command...guest may or may not respond
> - guest can't be trusted to be direct part of request/response loop
> - solve at QMP level
> - human monitor issues (multiple successive commands to complete a
>    single unplug)
>    - should be a GUI interface design decision, human monitor is not a
>      good design point
>      - digression into GUI interface
>    

The way this works IRL is:

1) Administrator presses a physical button.  This sends an ACPI 
notification to the guest.

2) The guest makes a decision about how to handle APCI notification.

3) To initiate unplug, the guest disables the device and performs an 
operation to indicate to the PCI bus that the device is unloaded.

4) Step (3) causes an LED (usually near the button in 1) to change colors

5) Administrator then physically removes the device.

So we need at least a QMP command to perform step (1).  Since (3) can 
occur independently of (1), it should be an async notification.  
device_del should only perform step (5).

A management tool needs to:

pci_unplug_request <slot>
/* wait for PCI_UNPLUGGED event */
device_del <slot>
netdev_del <backend>

> Drive caching
> - need to formalize the meanings in terms of data integrity guarantees
> - guest write cache (does it directly reflect the host write cache?)
>    - live migration, underlying block dev changes, so need to decouple the two
> - O_DIRECT + O_DSYNC
>    - O_DSYNC needed based on whether disk cache is available
>    - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
>    - how to manage w/out needing to flush every write, slow
> - perhaps start with O_DIRECT on raw, non-sparse files only?
> - backend needs to open backing store matching to guests disk cache state
> - O_DIRECT itself has inconsistent integrity guarantees
>    - works well with fully allocated file, depedent on disk cache disable
>      (or fs specific flushing)
> - filesystem specific warnings (ext4 w/ barriers on, brtfs)
> - need to be able to open w/ O_DSYNC depending on guets's write cache mode
> - make write cache visible to guest (need a knob for this)
> - qemu default is cache=writethrough, do we need to revisit that?
> - just present user with option whether or not to use host page cache
> - allow guest OS to choose disk write cache setting
>    - set up host backend accordingly
> - be nice preserve write cache settings over boot (outgrowing cmos storage)
> - maybe some host fs-level optimization possible
>    - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
> - conclusion
>    - one direct user tunable, "use host page cache or not"
>    - one guest OS tunable, "enable disk cache"
>    

IOW, a qdev 'write-cache=on|off' property and a blockdev 'direct=on|off' 
property.  For completeness, a blockdev 'unsafe=on|off' property.

Open flags are:

write-cache=on, direct=on    O_DIRECT
write-cache=off, direct=on    O_DIRECT | O_DSYNC
write-cache=on, direct=off    0
write-cache=off, direct=off    O_DSYNC

It's still unclear what our default mode will be.

The problem is, O_DSYNC has terrible performance on ext4 when barrier=1.

write-cache=on,direct=off is a bad default because if you do a simple 
performance test, you'll get better than native and that upsets people.

write-cache=off,direct=off is a bad default because ext4's default 
config sucks with this.

likewise, write-cache=off, direct=on is a bad default for the same reason.

Regards,

Anthonny Liguori

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-21 18:05 KVM call minutes for Sept 21 Chris Wright
  2010-09-21 18:23 ` Anthony Liguori
@ 2010-09-22  0:04 ` Nadav Har'El
  2010-09-22  1:48   ` Chris Wright
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Nadav Har'El @ 2010-09-22  0:04 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, avi

Hi, thanks for the summary.
I also listened-in on the call. I'm glad these issues are being discussed.

On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> Nested VMX
> - looking for forward progress and better collaboration between the
>   Intel and IBM teams

I'll be very happy if anyone, be it from Intel or somewhere else, would like
to help me work on nested VMX.

Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
that there might not be much point in cooperation before I finish getting
nested VMX merged into KVM. I agree, but my conclusion is different that what
I think the speaker implied: My conclusion is that it is important that we
merge the nested VMX code into KVM as soon as possible, because if nested VMX
is part of KVM (and not a set of patches which becomes stale the moment after
I release it) this will make it much easier for people to test it, use it,
and cooperate in developing it.

> - needs more review (not a new issue)

I think the reviews that nested VMX has received over the past year (thanks
to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
fantastic. You guys have shown deep understanding of the code, and found
numerous bugs, oversights, missing features, and also a fair share of ugly
code, and we (first Orit and Abel, and then I) have done are best to fix all
of these issues. I've personally learned a lot from the latest round of
reviews, and the discussions with you.

So I don't think there has been any lack of reviews. I don't think that
getting more reviews is the most important task ahead of us.

Surely, if more people review the code, more potential bugs will be spotted.
But this is always the case, with any software. I think the question now
is, what would it take to finally declare the code as "good enough to be
merged", with the understanding that even after being merged it will still be
considered an experimental feature, disabled by default and documented as
experimental. Nested SVM was also merged before it was perfect, and also
KVM itself was released before being perfect :-)

> - use cases

I don't kid myself that as soon as nested VMX is available in KVM, millions
of users worldwide will flock to use it. Definitely, many KVM users will never
find a need for nested virtualization. But I do believe that there are many
use cases. We outlined some of them in our paper (to be presented in a couple
of weeks in OSDI):

  1. Hosting one of the new breed of operating systems which have a hypervisor
     as part of them. Windows 7 with XP mode is one example. Linux with KVM
     is another.

  2. Platforms with embedded hypervisors in firmware need nested virt to
     run any workload - which can itself be a hypervisor with guests.

  3. Clouds users could put in their virtual machine a hypervisor with
     sub-guests, and run multiple virtual machines on the one virtual machine
     which they get.

  4. Enable live migration of entire hypervisors with their guests - for
     load balancing, disaster recovery, and so on.

  5. Honeypots and protection against hypervisor-level rootkits

  6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
     and also entire virtualization setups. An entire virtualization setup
     (hypervisor and all its guests) could be run as one virtual machine,
     allowing testing many such setups on one physical machine.

By the way, I find the question of "why do we need nested VMX" a bit odd,
seeing that KVM already supports nested virtualization (for SVM). Is it the
case that nested virtualization was found useful on AMD processors, but for
Intel processors, it isn't? Of course not :-) I think KVM should support
nested virtualization on neither architecture, or on both - and of course
I think it should be on both :-)

> - work todo
>   - merge baseline patch
>     - looks pretty good
>     - review is finding mostly small things at this point
>     - need some correctness verification (both review from Intel and testing)
>   - need a test suite
>     - test suite harness will help here
>       - a few dozen nested SVM tests are there, can follow for nested VMX
>   - nested EPT

I've been keeping track of the issues remaining from the last review, and
indeed only a few remain. Only 8 of the 24 patches have any outstanding
issue, and I'm working on those that remain, as you could see on the mailing
list in the last couple of weeks. If there's interest, I can even summarize
these remaing issues.

But since I'm working on these patches alone, I think we need to define our
priorities. Most of the outstanding review comments, while absolutely correct
(and I was amazed by the quality of the reviewer's comments), deal with
re-writing code that already works (to improve its style) or fixing relatively
rare cases. It is not clear that these issues are more important than the
other things listed in the summary above (test suite, nested EPT), but as
long as I continue to rewrite pieces of the nested VMX code, I'll never get
to those other important things.

To summarize, I'd love for us to define some sort of plan or roadmap on
what we (or I) need to do before we can finally merge the nested VMX code
into KVM. I would love for this roadmap to be relatively short, leaving
some of the outstanding issues to be done after the merge.

>   - optimize (reduce vmreads and vmwrites)

Before we implemented nested VMX, we also feared that the exits on vmreads and
vmwrites will kill the performance. As you can see in our paper (see preprint
in http://nadav.harel.org.il/papers/nested-osdi10.pdf), we actually showed
that this is not the case - while these extra exits do hurt performance,
in common workloads (i.e., not pathological worst-case scenarios), the
trapping vmread/vmwrite only moderately hurt performance. For example, with
kernbench the nested overhead (over single-level virtualization) was 14.5%,
which could have been reduced to 10.3% if vmread/vmwrite didn't trap.
For the SPECjbb workloads, the numbers are 7.8% vs. 6.3%. As you can see,
the numbers would be better if it weren't for the L1 vmread/vmwrites trapping,
but the difference is not huge. Certainly we can start with a version that
doesn't do anything about this issue.

So I don't think there is any urgent need to optimize nested VMX (the L0)
or the behavior of KVM as L1. Of course, there's always a long-term desire
to continue optimizing it.

> - has long term maintan

We have been maintaining this patch set for well over a year now, so I think
we've shown long term interest in maintaining it, even across personel
changes. In any case, it would have been much easier for us - and for other
people - to maintain this patch if it was part of KVM, and we wouldn't need
to take care of rebasing when KVM changes.

Thanks,
Nadav.

-- 
Nadav Har'El                        |   Wednesday, Sep 22 2010, 14 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Linux *is* user-friendly. Not
http://nadav.harel.org.il           |idiot-friendly, but user-friendly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22  0:04 ` Nadav Har'El
@ 2010-09-22  1:48   ` Chris Wright
  2010-09-22 17:49     ` Nadav Har'El
  2010-09-22  9:02   ` Gleb Natapov
  2010-09-26 13:27   ` Avi Kivity
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Wright @ 2010-09-22  1:48 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm, avi

* Nadav Har'El (nyh@math.technion.ac.il) wrote:
> On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> > Nested VMX
> > - looking for forward progress and better collaboration between the
> >   Intel and IBM teams
> 
> I'll be very happy if anyone, be it from Intel or somewhere else, would like
> to help me work on nested VMX.
> 
> Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
> that there might not be much point in cooperation before I finish getting
> nested VMX merged into KVM.

My recollection...it was Avi.

> I agree, but my conclusion is different that what
> I think the speaker implied: My conclusion is that it is important that we
> merge the nested VMX code into KVM as soon as possible, because if nested VMX
> is part of KVM (and not a set of patches which becomes stale the moment after
> I release it) this will make it much easier for people to test it, use it,
> and cooperate in developing it.

Yup.  And especially for follow-on work (like nested EPT).  Makes sense
to merge and build from merged base rather than have out-of-tree patchset
continue to grow and grow.

> > - needs more review (not a new issue)
> 
> I think the reviews that nested VMX has received over the past year (thanks
> to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
> fantastic. You guys have shown deep understanding of the code, and found
> numerous bugs, oversights, missing features, and also a fair share of ugly
> code, and we (first Orit and Abel, and then I) have done are best to fix all
> of these issues. I've personally learned a lot from the latest round of
> reviews, and the discussions with you.
> 
> So I don't think there has been any lack of reviews. I don't think that
> getting more reviews is the most important task ahead of us.

At earlier points of review there were issues considered fundamental
that needed to be fixed before merging (SMP and proper VMPTRLD emulation
springs to mind).  Now it seems it's down to smaller, more targetted
issues.  Some hesitancy is based on the complexity of the patches.
So more review helps...test harness does too.  Anything to build Avi's
confidence to merging the code ;)

> Surely, if more people review the code, more potential bugs will be spotted.
> But this is always the case, with any software. I think the question now
> is, what would it take to finally declare the code as "good enough to be
> merged", with the understanding that even after being merged it will still be
> considered an experimental feature, disabled by default and documented as
> experimental. Nested SVM was also merged before it was perfect, and also
> KVM itself was released before being perfect :-)

;)

> > - use cases
> 
> I don't kid myself that as soon as nested VMX is available in KVM, millions
> of users worldwide will flock to use it. Definitely, many KVM users will never
> find a need for nested virtualization. But I do believe that there are many
> use cases. We outlined some of them in our paper (to be presented in a couple
> of weeks in OSDI):
> 
>   1. Hosting one of the new breed of operating systems which have a hypervisor
>      as part of them. Windows 7 with XP mode is one example. Linux with KVM
>      is another.
> 
>   2. Platforms with embedded hypervisors in firmware need nested virt to
>      run any workload - which can itself be a hypervisor with guests.
> 
>   3. Clouds users could put in their virtual machine a hypervisor with
>      sub-guests, and run multiple virtual machines on the one virtual machine
>      which they get.
> 
>   4. Enable live migration of entire hypervisors with their guests - for
>      load balancing, disaster recovery, and so on.
> 
>   5. Honeypots and protection against hypervisor-level rootkits
> 
>   6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
>      and also entire virtualization setups. An entire virtualization setup
>      (hypervisor and all its guests) could be run as one virtual machine,
>      allowing testing many such setups on one physical machine.
> 
> By the way, I find the question of "why do we need nested VMX" a bit odd,
> seeing that KVM already supports nested virtualization (for SVM). Is it the
> case that nested virtualization was found useful on AMD processors, but for
> Intel processors, it isn't? Of course not :-) I think KVM should support
> nested virtualization on neither architecture, or on both - and of course
> I think it should be on both :-)

People keep looking for reasons to justify the cost of the effort, dunno
why "because it's cool" isn't good enough ;)  At any rate, that was mainly
a question of how it might be useful for production kind of environments.

> > - work todo
> >   - merge baseline patch
> >     - looks pretty good
> >     - review is finding mostly small things at this point
> >     - need some correctness verification (both review from Intel and testing)
> >   - need a test suite
> >     - test suite harness will help here
> >       - a few dozen nested SVM tests are there, can follow for nested VMX
> >   - nested EPT
>
> I've been keeping track of the issues remaining from the last review, and
> indeed only a few remain. Only 8 of the 24 patches have any outstanding
> issue, and I'm working on those that remain, as you could see on the mailing
> list in the last couple of weeks. If there's interest, I can even summarize
> these remaing issues.

If there are remaining issues that could be done by someone else, this
might be helpful.  Otherwise, probably only useful to you ;)

> But since I'm working on these patches alone, I think we need to define our
> priorities. Most of the outstanding review comments, while absolutely correct
> (and I was amazed by the quality of the reviewer's comments), deal with
> re-writing code that already works (to improve its style) or fixing relatively
> rare cases. It is not clear that these issues are more important than the
> other things listed in the summary above (test suite, nested EPT), but as
> long as I continue to rewrite pieces of the nested VMX code, I'll never get
> to those other important things.
> 
> To summarize, I'd love for us to define some sort of plan or roadmap on
> what we (or I) need to do before we can finally merge the nested VMX code
> into KVM. I would love for this roadmap to be relatively short, leaving
> some of the outstanding issues to be done after the merge.
> 
> >   - optimize (reduce vmreads and vmwrites)
> 
> Before we implemented nested VMX, we also feared that the exits on vmreads and
> vmwrites will kill the performance. As you can see in our paper (see preprint
> in http://nadav.harel.org.il/papers/nested-osdi10.pdf), we actually showed
> that this is not the case - while these extra exits do hurt performance,
> in common workloads (i.e., not pathological worst-case scenarios), the
> trapping vmread/vmwrite only moderately hurt performance. For example, with
> kernbench the nested overhead (over single-level virtualization) was 14.5%,
> which could have been reduced to 10.3% if vmread/vmwrite didn't trap.
> For the SPECjbb workloads, the numbers are 7.8% vs. 6.3%. As you can see,
> the numbers would be better if it weren't for the L1 vmread/vmwrites trapping,
> but the difference is not huge. Certainly we can start with a version that
> doesn't do anything about this issue.
> 
> So I don't think there is any urgent need to optimize nested VMX (the L0)
> or the behavior of KVM as L1. Of course, there's always a long-term desire
> to continue optimizing it.
> 
> > - has long term maintan
> 
> We have been maintaining this patch set for well over a year now, so I think
> we've shown long term interest in maintaining it, even across personel
> changes. In any case, it would have been much easier for us - and for other
> people - to maintain this patch if it was part of KVM, and we wouldn't need
> to take care of rebasing when KVM changes.

Sorry, I was typing too quickly.  That's a half-finished note which
should read:

  - has long term maintenance issues

And that means that there's two halves to the feature.  One is the nested
VMX code itself, for example each of new the EXIT_REASON_VM* handlers.
Other is glue to rest of KVM, for example, interrupt injection done
optimally.  Both have long term maintenance issues, but adding complexity
to core KVM was the context here.

thanks,
-chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22  0:04 ` Nadav Har'El
  2010-09-22  1:48   ` Chris Wright
@ 2010-09-22  9:02   ` Gleb Natapov
  2010-09-22 16:29     ` Nadav Har'El
  2010-09-26 13:27   ` Avi Kivity
  2 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2010-09-22  9:02 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 02:04:38AM +0200, Nadav Har'El wrote:
> Hi, thanks for the summary.
> I also listened-in on the call. I'm glad these issues are being discussed.
> 
> On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> > Nested VMX
> > - looking for forward progress and better collaboration between the
> >   Intel and IBM teams
> 
> I'll be very happy if anyone, be it from Intel or somewhere else, would like
> to help me work on nested VMX.
> 
> Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
> that there might not be much point in cooperation before I finish getting
> nested VMX merged into KVM. I agree, but my conclusion is different that what
> I think the speaker implied: My conclusion is that it is important that we
> merge the nested VMX code into KVM as soon as possible, because if nested VMX
> is part of KVM (and not a set of patches which becomes stale the moment after
> I release it) this will make it much easier for people to test it, use it,
> and cooperate in developing it.
> 
> > - needs more review (not a new issue)
> 
> I think the reviews that nested VMX has received over the past year (thanks
> to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
> fantastic. You guys have shown deep understanding of the code, and found
> numerous bugs, oversights, missing features, and also a fair share of ugly
> code, and we (first Orit and Abel, and then I) have done are best to fix all
> of these issues. I've personally learned a lot from the latest round of
> reviews, and the discussions with you.
> 
> So I don't think there has been any lack of reviews. I don't think that
> getting more reviews is the most important task ahead of us.
> 
> Surely, if more people review the code, more potential bugs will be spotted.
> But this is always the case, with any software. I think the question now
> is, what would it take to finally declare the code as "good enough to be
> merged", with the understanding that even after being merged it will still be
> considered an experimental feature, disabled by default and documented as
> experimental. Nested SVM was also merged before it was perfect, and also
> KVM itself was released before being perfect :-)
> 
There is only one outstanding serious issue from my point of view: event
injection path. I want it to be similar to how nested SVM handles it. I
don't see why it can't be done the same way for VMX too. The way nested SVM
does it looks cleaner and making code paths similar will allow us to
consolidate the logic in common code later. This issue is too
fundamental to be fixed after merge IMHO. Other nitpicks about missing
checks that real HW does, but emulation doesn't can be fixed any time
after merge.

> > - use cases
> 
> I don't kid myself that as soon as nested VMX is available in KVM, millions
> of users worldwide will flock to use it. Definitely, many KVM users will never
> find a need for nested virtualization. But I do believe that there are many
> use cases. We outlined some of them in our paper (to be presented in a couple
> of weeks in OSDI):
> 
>   1. Hosting one of the new breed of operating systems which have a hypervisor
>      as part of them. Windows 7 with XP mode is one example. Linux with KVM
>      is another.
> 
>   2. Platforms with embedded hypervisors in firmware need nested virt to
>      run any workload - which can itself be a hypervisor with guests.
> 
>   3. Clouds users could put in their virtual machine a hypervisor with
>      sub-guests, and run multiple virtual machines on the one virtual machine
>      which they get.
> 
>   4. Enable live migration of entire hypervisors with their guests - for
>      load balancing, disaster recovery, and so on.
> 
>   5. Honeypots and protection against hypervisor-level rootkits
> 
>   6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
>      and also entire virtualization setups. An entire virtualization setup
>      (hypervisor and all its guests) could be run as one virtual machine,
>      allowing testing many such setups on one physical machine.
> 
> By the way, I find the question of "why do we need nested VMX" a bit odd,
> seeing that KVM already supports nested virtualization (for SVM). Is it the
> case that nested virtualization was found useful on AMD processors, but for
> Intel processors, it isn't? Of course not :-) I think KVM should support
> nested virtualization on neither architecture, or on both - and of course
> I think it should be on both :-)
I think the question was "why do we need nested virtualization" ;)

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22  9:02   ` Gleb Natapov
@ 2010-09-22 16:29     ` Nadav Har'El
  2010-09-22 17:47       ` Gleb Natapov
  0 siblings, 1 reply; 22+ messages in thread
From: Nadav Har'El @ 2010-09-22 16:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Chris Wright, kvm, avi

On Wed, Sep 22, 2010, Gleb Natapov wrote about "Re: KVM call minutes for Sept 21":
> There is only one outstanding serious issue from my point of view: event
> injection path. I want it to be similar to how nested SVM handles it. I
> don't see why it can't be done the same way for VMX too. The way nested SVM
> does it looks cleaner and making code paths similar will allow us to
> consolidate the logic in common code later. This issue is too
> fundamental to be fixed after merge IMHO. Other nitpicks about missing
> checks that real HW does, but emulation doesn't can be fixed any time
> after merge.

I'll try my best to accomodate your request, but I tried to explain in my
previous mails (and so dir Orit Wasserman in her mails last year, by the way -
I found a long thread in the mailing list...) that there appears to be a
fundemental additional complexity in VMX that doesn't exist in SVM. In VMX,
you might have to inject another exception (IDT_VECTORING_INFO_FIELD) at the
same time that you're already trying to inject a page fault to L1, and this
doesn't appear (?) to exist in SVM.
However, since I didn't write this code myself, and didn't encounter all the
problems myself, I still want to try to see whether I can get "cleaner" code
to actually work. But I want it to be really cleaner - not just remove one
somewhat-ugly intervention from vmx_complete_interrupts() and move it to an
even uglier intervention somewhere else.

In any case, while I obviously agree that it's your prerogative not to merge
code that you consider ugly, I still don't see any particular problem to start
with the current, working, code, and fix it later. It's not like we can never
change this code after it's in - it's clearly marked with if(nested) and
doesn't effect anything in the non-nested path.

> I think the question was "why do we need nested virtualization" ;)

Then why was nested SVM merged in the first place? Isn't it too late to
ask this question now? :-)

Anyway, I tried to answer this question in my previous email.
I'm not sure what more I can say to answer this question better.

Thanks,

Nadav.

-- 
Nadav Har'El                        |   Wednesday, Sep 22 2010, 15 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Give Yogi a rifle. Support your right to
http://nadav.harel.org.il           |arm bears!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 16:29     ` Nadav Har'El
@ 2010-09-22 17:47       ` Gleb Natapov
  2010-09-22 19:20         ` Joerg Roedel
  0 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2010-09-22 17:47 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 06:29:00PM +0200, Nadav Har'El wrote:
> On Wed, Sep 22, 2010, Gleb Natapov wrote about "Re: KVM call minutes for Sept 21":
> > There is only one outstanding serious issue from my point of view: event
> > injection path. I want it to be similar to how nested SVM handles it. I
> > don't see why it can't be done the same way for VMX too. The way nested SVM
> > does it looks cleaner and making code paths similar will allow us to
> > consolidate the logic in common code later. This issue is too
> > fundamental to be fixed after merge IMHO. Other nitpicks about missing
> > checks that real HW does, but emulation doesn't can be fixed any time
> > after merge.
> 
> I'll try my best to accomodate your request, but I tried to explain in my
> previous mails (and so dir Orit Wasserman in her mails last year, by the way -
> I found a long thread in the mailing list...) that there appears to be a
> fundemental additional complexity in VMX that doesn't exist in SVM. In VMX,
> you might have to inject another exception (IDT_VECTORING_INFO_FIELD) at the
> same time that you're already trying to inject a page fault to L1, and this
> doesn't appear (?) to exist in SVM.
exitintinfo. Really SVM and VMX event injection are practically
identical.

> However, since I didn't write this code myself, and didn't encounter all the
> problems myself, I still want to try to see whether I can get "cleaner" code
> to actually work. But I want it to be really cleaner - not just remove one
> somewhat-ugly intervention from vmx_complete_interrupts() and move it to an
> even uglier intervention somewhere else.
> 
> In any case, while I obviously agree that it's your prerogative not to merge
> code that you consider ugly, I still don't see any particular problem to start
> with the current, working, code, and fix it later. It's not like we can never
> change this code after it's in - it's clearly marked with if(nested) and
> doesn't effect anything in the non-nested path.
> 
After code it merged there is much less incentive to change things
drastically.

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22  1:48   ` Chris Wright
@ 2010-09-22 17:49     ` Nadav Har'El
  2010-09-22 18:03       ` Anthony Liguori
  2010-09-22 19:48       ` Joerg Roedel
  0 siblings, 2 replies; 22+ messages in thread
From: Nadav Har'El @ 2010-09-22 17:49 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, avi

On Tue, Sep 21, 2010, Chris Wright wrote about "Re: KVM call minutes for Sept 21":
> People keep looking for reasons to justify the cost of the effort, dunno
> why "because it's cool" isn't good enough ;)  At any rate, that was mainly
> a question of how it might be useful for production kind of environments.

I gave in my previous mail a long list of examples what you might do with
nested virtualization, and many of them could be called "production kind of
enviroments".

Let me give you one small example that I recently encountered, although by
no means do I think this is the best example, nor the most important one.
One of my colleagues wanted to run tests on a particular software product.
Following the recent virtualization trend, he didn't buy an physical test
machine, but rather rented a virtual machine on an internal compute-cloud
service similar in spirit to Amazon's EC2, and ran his test on this virtual
machine.
The problem he then faced was that he actually hoped he could run his test
on several different operating systems - e.g., several versions of Linux and
Windows. No problem - he would just start multiple virtual machines - either
concurrently or in series - each from a different image and running a different
OS.
But there was a big cost problem: Like Amazon's service, this service also
charged by full hours (if you use 10 minutes, you are charged for a full hour),
and worse - had a virtual-machine start/destroy cost. So if his test needed
to run for 10 minutes on Windows XP, then 10 minutes on Windows 7, then
10 minutes on Linux, he would pay 3 times more than he would to get one
virtual machine for the full hour. Moreover, he would need software to
automate all this succession of virtual machine startups and stops.

What he could have used is nested virtualization: He could get one virtual
machine for 30 minutes, and run on it a nested hypervisor and in it his
own 3 virtual machines, the two windows and one Linux. Moreover, he would
have one image that contains this internal setup, making it easy to start
and stop this entire test setup anytime, anywhere. In essence, nested
virtulization will allow him to easily and cheaply sub-divide and organize
the one virtual machine he is renting - exactly like virtualization allowed
doing the same on one physical machine.

Again, this is just an example need that I encountered last week from
an actual user of a real cloud service. By no means do I think this is the
only example, the best example, or the example that gives the most business
value.

> If there are remaining issues that could be done by someone else, this
> might be helpful.  Otherwise, probably only useful to you ;)

In theory (if we have a public git repository to track this), there is no
reason not to divide the remaining issue between people. For example, one
person change fix the IDT code that bothered Gleb, while another person
reorders the vmcs12 structure as requested in another review, and a
third person writes tests. All we'd need a repository to work on the code
together. KVM's main repository would of course be best, which is why I'm
hoping to get these patches checked-in, rather than continue to work
separately like we have been doing.

>   - has long term maintenance issues
> 
> And that means that there's two halves to the feature.  One is the nested
> VMX code itself, for example each of new the EXIT_REASON_VM* handlers.
> Other is glue to rest of KVM, for example, interrupt injection done
> optimally.  Both have long term maintenance issues, but adding complexity
> to core KVM was the context here.

I believe that in the current state of the code, nested VMX adds little
complexity to the non-nested code - just a few if's. Of course, it also
adds a lot of new code, but none of this code gets run in the non-nested
case.

The maintenance issues I see are the other way around - i.e., once
in a while when non-nested changes are made to KVM, nested stops working and
needs to be fixed. A prime example of this was the lazy FPU loading added in
the beginning of the year, which broke our assumption that L0's
CR0_GUEST_HOST_MASK always has all its bits on, making nested stop working
until I fixed it (it wasn't easy debugging these problems ;-)).
I wholeheartedly agree that if nobody continues to maintain nested VMX,
it can and will become "stale" and may stop working after unrelated code
in KVM is modified. Adding tests can help here (so that when someone modifies
some non-nested KVM feature he will at least know that he broke nested), but
definitely, we'll need to continue having someone who is interested in
keeping the nested VMX working. In the forseeable future, I'll volunteer
to be that someone.  

Nadav.

-- 
Nadav Har'El                        |   Wednesday, Sep 22 2010, 15 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |All those who believe in psychokinesis,
http://nadav.harel.org.il           |raise my hand.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 17:49     ` Nadav Har'El
@ 2010-09-22 18:03       ` Anthony Liguori
  2010-09-22 19:34         ` Joerg Roedel
  2010-09-22 19:48       ` Joerg Roedel
  1 sibling, 1 reply; 22+ messages in thread
From: Anthony Liguori @ 2010-09-22 18:03 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm, avi

On 09/22/2010 12:49 PM, Nadav Har'El wrote:
> On Tue, Sep 21, 2010, Chris Wright wrote about "Re: KVM call minutes for Sept 21":
>    
>> People keep looking for reasons to justify the cost of the effort, dunno
>> why "because it's cool" isn't good enough ;)  At any rate, that was mainly
>> a question of how it might be useful for production kind of environments.
>>      
> I gave in my previous mail a long list of examples what you might do with
> nested virtualization, and many of them could be called "production kind of
> enviroments".
>    

I don't think arguing about use cases is very productive.

The concern is that nested VMX is invasive and presents a long term 
maintenance burden.  There are two ways to mitigate this burden.  The 
first is to work extra hard to make things as common as humanly possible 
between nested VMX and nested SVM.  The second is to make sure that we 
have an aggressive set of test cases.

I think the later is perhaps the most important point of all.

Regards.

Anthony Liguori

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 17:47       ` Gleb Natapov
@ 2010-09-22 19:20         ` Joerg Roedel
  2010-09-22 20:18           ` Gleb Natapov
  2010-09-26 14:03           ` Avi Kivity
  0 siblings, 2 replies; 22+ messages in thread
From: Joerg Roedel @ 2010-09-22 19:20 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Nadav Har'El, Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 07:47:06PM +0200, Gleb Natapov wrote:
> On Wed, Sep 22, 2010 at 06:29:00PM +0200, Nadav Har'El wrote:

> > In any case, while I obviously agree that it's your prerogative not to merge
> > code that you consider ugly, I still don't see any particular problem to start
> > with the current, working, code, and fix it later. It's not like we can never
> > change this code after it's in - it's clearly marked with if(nested) and
> > doesn't effect anything in the non-nested path.
> > 
> After code it merged there is much less incentive to change things
> drastically.

I think nested svm is a good counter example to that. It has drastically
improved since it was merged. Ok, it hasn't _changed_ drastically, but
what drastic changes do we expect to become necessary in the nested-vmx
code?

	Joerg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 18:03       ` Anthony Liguori
@ 2010-09-22 19:34         ` Joerg Roedel
  0 siblings, 0 replies; 22+ messages in thread
From: Joerg Roedel @ 2010-09-22 19:34 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Nadav Har'El, Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 01:03:55PM -0500, Anthony Liguori wrote:
> On 09/22/2010 12:49 PM, Nadav Har'El wrote:
>> On Tue, Sep 21, 2010, Chris Wright wrote about "Re: KVM call minutes for Sept 21":
>>    
>>> People keep looking for reasons to justify the cost of the effort, dunno
>>> why "because it's cool" isn't good enough ;)  At any rate, that was mainly
>>> a question of how it might be useful for production kind of environments.
>>>      
>> I gave in my previous mail a long list of examples what you might do with
>> nested virtualization, and many of them could be called "production kind of
>> enviroments".
>>    
>
> I don't think arguing about use cases is very productive.
>
> The concern is that nested VMX is invasive and presents a long term  
> maintenance burden.  There are two ways to mitigate this burden.  The  
> first is to work extra hard to make things as common as humanly possible  
> between nested VMX and nested SVM.  The second is to make sure that we  
> have an aggressive set of test cases.

This can be stated about nested virtualization support in KVM in
general. And since we need to solve this for nested SVM too I don't
think these questions should prevent the merge of nested VMX.

Okay, nested VMX is very new. I don't know the current state and quality
of the patches and the testing they have experienced. But if they are in
the same quality state as nested SVM when it was merged I think nested
VMX should not be blocked. The code should be disabled by default of
course for some time to minimize the risks. One big advantage of merging
is that Nadav has a lot more time to improve the code instead of having
to rebase it all the time.

	Joerg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 17:49     ` Nadav Har'El
  2010-09-22 18:03       ` Anthony Liguori
@ 2010-09-22 19:48       ` Joerg Roedel
  1 sibling, 0 replies; 22+ messages in thread
From: Joerg Roedel @ 2010-09-22 19:48 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 07:49:43PM +0200, Nadav Har'El wrote:

> I believe that in the current state of the code, nested VMX adds little
> complexity to the non-nested code - just a few if's. Of course, it also
> adds a lot of new code, but none of this code gets run in the non-nested
> case.

As it is with Nested-SVM.

> The maintenance issues I see are the other way around - i.e., once
> in a while when non-nested changes are made to KVM, nested stops working and
> needs to be fixed. A prime example of this was the lazy FPU loading added in
> the beginning of the year, which broke our assumption that L0's
> CR0_GUEST_HOST_MASK always has all its bits on, making nested stop working
> until I fixed it (it wasn't easy debugging these problems ;-)).
> I wholeheartedly agree that if nobody continues to maintain nested VMX,
> it can and will become "stale" and may stop working after unrelated code
> in KVM is modified. Adding tests can help here (so that when someone modifies
> some non-nested KVM feature he will at least know that he broke nested), but
> definitely, we'll need to continue having someone who is interested in
> keeping the nested VMX working. In the forseeable future, I'll volunteer
> to be that someone.

I know very well what you are talking about. It has happend a couple of
times to nested SVM that it broke because of other unrelated patches. I
also had to fix nested SVM when the new lazy FPU switching code was
merged. The best way to cope with that in the future is to restructure
the code to that it is more unlikely to break.
One example: I had bugs where the generic KVM code called into SVM
specific parts which intended to changed state in the L1 VMCB. But since
L2 was running this was changed in the VMCB of L2 and got lost when the
next vmexit was emulated (which is really bad for tsc_offset for
example).
Another thing is intercepts. When KVM wants to change the intercept
masks for L1 you have to recalculate the merged intercept masks for L2.
The best strategy to cope with that is to add accessor functions which
change L1 state and which are aware of nesting.

	Joerg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 19:20         ` Joerg Roedel
@ 2010-09-22 20:18           ` Gleb Natapov
  2010-09-22 23:00             ` Nadav Har'El
  2010-09-26 14:03           ` Avi Kivity
  1 sibling, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2010-09-22 20:18 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Nadav Har'El, Chris Wright, kvm, avi

On Wed, Sep 22, 2010 at 09:20:38PM +0200, Joerg Roedel wrote:
> On Wed, Sep 22, 2010 at 07:47:06PM +0200, Gleb Natapov wrote:
> > On Wed, Sep 22, 2010 at 06:29:00PM +0200, Nadav Har'El wrote:
> 
> > > In any case, while I obviously agree that it's your prerogative not to merge
> > > code that you consider ugly, I still don't see any particular problem to start
> > > with the current, working, code, and fix it later. It's not like we can never
> > > change this code after it's in - it's clearly marked with if(nested) and
> > > doesn't effect anything in the non-nested path.
> > > 
> > After code it merged there is much less incentive to change things
> > drastically.
> 
> I think nested svm is a good counter example to that. It has drastically
> improved since it was merged. Ok, it hasn't _changed_ drastically, but
> what drastic changes do we expect to become necessary in the nested-vmx
> code?
> 
As I wrote in another mail I want event injection to be more close to
what SVM does. All well maintained code improves with time rare parts
are reworked even if maintained. Nadav said that he doesn't even know
how this part of code is working. This is worrying.

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 20:18           ` Gleb Natapov
@ 2010-09-22 23:00             ` Nadav Har'El
  0 siblings, 0 replies; 22+ messages in thread
From: Nadav Har'El @ 2010-09-22 23:00 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Joerg Roedel, Chris Wright, kvm, avi

On Wed, Sep 22, 2010, Gleb Natapov wrote about "Re: KVM call minutes for Sept 21":
> are reworked even if maintained. Nadav said that he doesn't even know
> how this part of code is working. This is worrying.

Hi,

I just wanted to clarify that reason I don't know exactly how this specific
part of the code works, is because I didn't write it. It doesn't mean that
I think it is so complex that nobody can ever understand it, or that there
is a cause for worry.

The people who wrote this code were convinced (see thread from October 2009,
starting with http://www.mail-archive.com/kvm@vger.kernel.org/msg23898.html)
that their approach was the right thing to do for the IDT_VECTORING_INFO.
Between them being convinced that this is the right way, and you being
convinced that it is the wrong way, I am not (yet) convinced about either
direction. Before I'm quick to simply get rid of this (working) code and
replace it with something else, I need to understand all the little details
involved, and to try to rewrite the code to be more nested-SVM-like and still
work, and to understand how I might test whether it actually works (and
it isn't simply that my workload misses this case altogether). I'll do this.

Nadav.

-- 
Nadav Har'El                        |    Thursday, Sep 23 2010, 15 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |I'm a peripheral visionary: I see into
http://nadav.harel.org.il           |the future, but mostly off to the sides.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22  0:04 ` Nadav Har'El
  2010-09-22  1:48   ` Chris Wright
  2010-09-22  9:02   ` Gleb Natapov
@ 2010-09-26 13:27   ` Avi Kivity
  2010-09-26 14:28     ` Nadav Har'El
  2 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2010-09-26 13:27 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm

  On 09/22/2010 02:04 AM, Nadav Har'El wrote:
> Hi, thanks for the summary. I also listened-in on the call. I'm glad these issues are being discussed.
>
> On Tue, Sep 21, 2010, Chris Wright wrote about "KVM call minutes for Sept 21":
> >  Nested VMX
> >  - looking for forward progress and better collaboration between the
> >    Intel and IBM teams
>
> I'll be very happy if anyone, be it from Intel or somewhere else, would like
> to help me work on nested VMX.
>
> Somebody (I don't recognize your voices yet, sorry...) mentioned on the call
> that there might not be much point in cooperation before I finish getting
> nested VMX merged into KVM. I agree, but my conclusion is different that what
> I think the speaker implied: My conclusion is that it is important that we
> merge the nested VMX code into KVM as soon as possible, because if nested VMX
> is part of KVM (and not a set of patches which becomes stale the moment after
> I release it) this will make it much easier for people to test it, use it,
> and cooperate in developing it.

Don't worry, I want to merge nvmx as soon as possible (but not sooner).

> >  - needs more review (not a new issue)
>
> I think the reviews that nested VMX has received over the past year (thanks
> to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been
> fantastic. You guys have shown deep understanding of the code, and found
> numerous bugs, oversights, missing features, and also a fair share of ugly
> code, and we (first Orit and Abel, and then I) have done are best to fix all
> of these issues. I've personally learned a lot from the latest round of
> reviews, and the discussions with you.
>
> So I don't think there has been any lack of reviews. I don't think that
> getting more reviews is the most important task ahead of us.

The code is so incredibly complex that each review round raises new 
issues, simply because they were hidden by other issues previously, or 
because the reviewer's understanding only reached the point where they 
can notice it recently.  Usually after a few rounds the review 
converges.  This hasn't yet happened with nvmx, because it is so 
complicated.

> Surely, if more people review the code, more potential bugs will be spotted.
> But this is always the case, with any software. I think the question now
> is, what would it take to finally declare the code as "good enough to be
> merged", with the understanding that even after being merged it will still be
> considered an experimental feature, disabled by default and documented as
> experimental. Nested SVM was also merged before it was perfect, and also
> KVM itself was released before being perfect :-)

The bar is set higher than ever before.

My goal is not so much to get perfect vmx emulation, instead to get code 
I can understand and modify.

> >  - use cases
>
> I don't kid myself that as soon as nested VMX is available in KVM, millions
> of users worldwide will flock to use it. Definitely, many KVM users will never
> find a need for nested virtualization. But I do believe that there are many
> use cases. We outlined some of them in our paper (to be presented in a couple
> of weeks in OSDI):
>
>    1. Hosting one of the new breed of operating systems which have a hypervisor
>       as part of them. Windows 7 with XP mode is one example. Linux with KVM
>       is another.
>
>    2. Platforms with embedded hypervisors in firmware need nested virt to
>       run any workload - which can itself be a hypervisor with guests.
>
>    3. Clouds users could put in their virtual machine a hypervisor with
>       sub-guests, and run multiple virtual machines on the one virtual machine
>       which they get.
>
>    4. Enable live migration of entire hypervisors with their guests - for
>       load balancing, disaster recovery, and so on.
>
>    5. Honeypots and protection against hypervisor-level rootkits
>
>    6. Make it easier to test, demonstrate, benchmark and debug hypervisors,
>       and also entire virtualization setups. An entire virtualization setup
>       (hypervisor and all its guests) could be run as one virtual machine,
>       allowing testing many such setups on one physical machine.
>
> By the way, I find the question of "why do we need nested VMX" a bit odd,
> seeing that KVM already supports nested virtualization (for SVM). Is it the
> case that nested virtualization was found useful on AMD processors, but for
> Intel processors, it isn't? Of course not :-) I think KVM should support
> nested virtualization on neither architecture, or on both - and of course
> I think it should be on both :-)

Don't worry, nvmx will not get rejected on those grounds.  However, the 
lack of use cases is worrying.  I haven't seen the use cases you list 
above used with nsvm, and no bug reports from users either.

> >  - work todo
> >    - merge baseline patch
> >      - looks pretty good
> >      - review is finding mostly small things at this point
> >      - need some correctness verification (both review from Intel and testing)
> >    - need a test suite
> >      - test suite harness will help here
> >        - a few dozen nested SVM tests are there, can follow for nested VMX
> >    - nested EPT
>
> I've been keeping track of the issues remaining from the last review, and
> indeed only a few remain. Only 8 of the 24 patches have any outstanding
> issue, and I'm working on those that remain, as you could see on the mailing
> list in the last couple of weeks. If there's interest, I can even summarize
> these remaing issues.

No need.  But, as we haven't converged yet, we may see new review items.

> But since I'm working on these patches alone, I think we need to define our
> priorities. Most of the outstanding review comments, while absolutely correct
> (and I was amazed by the quality of the reviewer's comments), deal with
> re-writing code that already works (to improve its style) or fixing relatively
> rare cases. It is not clear that these issues are more important than the
> other things listed in the summary above (test suite, nested EPT), but as
> long as I continue to rewrite pieces of the nested VMX code, I'll never get
> to those other important things.

Corner cases in emulation are less important; we can get away by having 
a printk() when we know we're not doing the right thing.  What I don't 
want is silent breakages, these become impossible to debug.

What's absolutely critical is having code that is understood by more 
people than just you.  Part of the process of increasing knowledge of 
the code is reading it and raising issues... nothing much we can do to 
change it.


> To summarize, I'd love for us to define some sort of plan or roadmap on
> what we (or I) need to do before we can finally merge the nested VMX code
> into KVM. I would love for this roadmap to be relatively short, leaving
> some of the outstanding issues to be done after the merge.
>
> >    - optimize (reduce vmreads and vmwrites)
>
> Before we implemented nested VMX, we also feared that the exits on vmreads and
> vmwrites will kill the performance. As you can see in our paper (see preprint
> in http://nadav.harel.org.il/papers/nested-osdi10.pdf), we actually showed
> that this is not the case - while these extra exits do hurt performance,
> in common workloads (i.e., not pathological worst-case scenarios), the
> trapping vmread/vmwrite only moderately hurt performance. For example, with
> kernbench the nested overhead (over single-level virtualization) was 14.5%,
> which could have been reduced to 10.3% if vmread/vmwrite didn't trap.
> For the SPECjbb workloads, the numbers are 7.8% vs. 6.3%. As you can see,
> the numbers would be better if it weren't for the L1 vmread/vmwrites trapping,
> but the difference is not huge. Certainly we can start with a version that
> doesn't do anything about this issue.

Sure.  But I think that for I/O intensive benchmarks the slowdown will 
be much bigger.

> So I don't think there is any urgent need to optimize nested VMX (the L0)
> or the behavior of KVM as L1. Of course, there's always a long-term desire
> to continue optimizing it.
>
> >  - has long term maintan
>
> We have been maintaining this patch set for well over a year now, so I think
> we've shown long term interest in maintaining it, even across personel
> changes. In any case, it would have been much easier for us - and for other
> people - to maintain this patch if it was part of KVM, and we wouldn't need
> to take care of rebasing when KVM changes.
>
>

I'm worried about maintaining core vmx after nvmx is merged, not nvmx 
itself.  There are simply many more things to consider when making a change.

wrt a roadmap, clearly the merge takes precedence, and nested ept would 
be a strong candidate for second effort.  I don't have a lot of advice 
about speeding up the merge except to reduce the cycle time for 
patchsets, which is currently quite long.

Note that we can work on a test suite in parallel with the kvm code.  
Look at kvm-unit-tests.git x86/svm.c.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-22 19:20         ` Joerg Roedel
  2010-09-22 20:18           ` Gleb Natapov
@ 2010-09-26 14:03           ` Avi Kivity
  2010-09-26 20:25             ` Joerg Roedel
  1 sibling, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2010-09-26 14:03 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Gleb Natapov, Nadav Har'El, Chris Wright, kvm

  On 09/22/2010 09:20 PM, Joerg Roedel wrote:
> On Wed, Sep 22, 2010 at 07:47:06PM +0200, Gleb Natapov wrote:
> >  On Wed, Sep 22, 2010 at 06:29:00PM +0200, Nadav Har'El wrote:
>
> >  >  In any case, while I obviously agree that it's your prerogative not to merge
> >  >  code that you consider ugly, I still don't see any particular problem to start
> >  >  with the current, working, code, and fix it later. It's not like we can never
> >  >  change this code after it's in - it's clearly marked with if(nested) and
> >  >  doesn't effect anything in the non-nested path.
> >  >
> >  After code it merged there is much less incentive to change things
> >  drastically.
>
> I think nested svm is a good counter example to that. It has drastically
> improved since it was merged. Ok, it hasn't _changed_ drastically, but
> what drastic changes do we expect to become necessary in the nested-vmx
> code?
>

I don't expect drastic changes, but then, I still don't understand it well.

Part of the review process is the maintainer becoming familiar (and, in 
some cases, comfortable) with the code.  The nit-picking is often just 
me proving to myself that I understand what's happening.

btw, speaking of drastic changes to nsvm, one thing I'd like to see is 
the replacement of those kmaps with something like put_user_try() and 
put_user_catch().  It should be as fast (or faster) than kmaps, and not 
affect preemptibility.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-26 13:27   ` Avi Kivity
@ 2010-09-26 14:28     ` Nadav Har'El
  2010-09-26 14:50       ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Nadav Har'El @ 2010-09-26 14:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, kvm

On Sun, Sep 26, 2010, Avi Kivity wrote about "Re: KVM call minutes for Sept 21":
> Don't worry, I want to merge nvmx as soon as possible (but not sooner).

Thanks, I'm happy to hear that.

> >So I don't think there has been any lack of reviews. I don't think that
> >getting more reviews is the most important task ahead of us.
> 
> The code is so incredibly complex that each review round raises new 
> issues, simply because they were hidden by other issues previously, or 
> because the reviewer's understanding only reached the point where they 
> can notice it recently.  Usually after a few rounds the review 
> converges.
> This hasn't yet happened with nvmx, because it is so complicated.

I completely agree. If you look at the sheer size of the VMX spec, it is
simply unavoidable that nested VMX, which is basically a VMX implementation,
will be complex. So if you want the nested VMX feature in KVM (and I
understand that you do), then you can't avoid this added complexity...

What is now my task is to do my best to make sure that while nested VMX
is complex, it shouldn't be more complex than it needs to be. But it
can't be less complex than it needs to be ;-)

> Don't worry, nvmx will not get rejected on those grounds.  However, the 
> lack of use cases is worrying.  I haven't seen the use cases you list 
> above used with nsvm, and no bug reports from users either.

I don't know why nested x86 virtualization hasn't caught on. I'll be honest
and admit that - yes - it's possibile that it's simply not useful to actual
users.

But there are other possibilities - which I'd like to think are the right ones.
Maybe it's a chicken-and-egg problem: Perhaps nobody knows how to use nested
virtualization because the most common hypervisor (vmware) doesn't support it,
and common clouds (e.g., Amazon EC2) don't support these kind of use cases.
Maybe the number of people who use KVM *and* AMD *and* need nested
virtualization is not big enough to have any sort of critical mass for building
use cases for this feature. And maybe nested virtualization is one of those
features that is useless to the majority of users, but to the few who need it,
it is very important.

I've been told that IBM System Z has supported (hardware-assisted) nested
virtualization right from the start (in the 70s), and there people actually
use nested virtualization all the time, typically of depth 2. I understand
(but admit that I don't have any first-hand knowledge of this) that nested
virtualization is mostly useful there for organization needs - one very
strong machine is divided into many virtual machines, and often a need arises
for a second level of organization. Maybe someone on this list has actual
experience with these systems and can relate war-stories about them?

> No need.  But, as we haven't converged yet, we may see new review items.

Of course, I'm used to those by now :-)

> What's absolutely critical is having code that is understood by more 
> people than just you.  Part of the process of increasing knowledge of 
> the code is reading it and raising issues... nothing much we can do to 
> change it.

I agree. It's definitely a really bad thing when only one person knows any
part of the code, and I'm happy that you've dived so deeply into the code
(and even though you feel like you don't understand it - I feel that you
do :-)). There are more people who understand much of this code even better
than I do (the people who wrote it originally), and of course anybody can
learn this code just like I did this year.

> I'm worried about maintaining core vmx after nvmx is merged, not nvmx 
> itself.  There are simply many more things to consider when making a change.

Right, but how can we avoid this issue, assuming that you do want nvmx in?
May I ask how this effected nested SVM?

> Note that we can work on a test suite in parallel with the kvm code.  
> Look at kvm-unit-tests.git x86/svm.c.

I will.

-- 
Nadav Har'El                        |      Sunday, Sep 26 2010, 18 Tishri 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |I'm a peripheral visionary: I see into
http://nadav.harel.org.il           |the future, but mostly off to the sides.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-26 14:28     ` Nadav Har'El
@ 2010-09-26 14:50       ` Avi Kivity
  0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2010-09-26 14:50 UTC (permalink / raw)
  To: Nadav Har'El; +Cc: Chris Wright, kvm

  On 09/26/2010 04:28 PM, Nadav Har'El wrote:
>   >  I'm worried about maintaining core vmx after nvmx is merged, not nvmx
> >  itself.  There are simply many more things to consider when making a change.
>
> Right, but how can we avoid this issue, assuming that you do want nvmx in?

We can't avoid it.  We can mitigate it to some extent by structuring the 
code correctly.

> May I ask how this effected nested SVM?
>

I kept breaking nsvm when making changes to core svm, and Joerg kept 
fixing them.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-26 14:03           ` Avi Kivity
@ 2010-09-26 20:25             ` Joerg Roedel
  2010-09-27  8:36               ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2010-09-26 20:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, Nadav Har'El, Chris Wright, kvm

On Sun, Sep 26, 2010 at 04:03:13PM +0200, Avi Kivity wrote:

> I don't expect drastic changes, but then, I still don't understand it well.
>
> Part of the review process is the maintainer becoming familiar (and, in  
> some cases, comfortable) with the code.  The nit-picking is often just  
> me proving to myself that I understand what's happening.

Right, understanding is an important part. One thing I try to achieve
for nested-svm is to make it less likely that unrelated code changes
break it. One step will be accessor funtions to change intercept masks
and tsc_offset.

> btw, speaking of drastic changes to nsvm, one thing I'd like to see is  
> the replacement of those kmaps with something like put_user_try() and  
> put_user_catch().  It should be as fast (or faster) than kmaps, and not  
> affect preemptibility.

Yes, I want to get rid of them too. I thought about using
copy_from/to_user in the vmrun/vmexit path. I need to measure if this
has any performance impact, though.
But the vmrun/vmexit path in nested-svm will see some major changes in
the near future anyway to improve performance and prepare it for
clean-bits emulation. In this step I will also address the kmap
problem. But first on the list for me is to make the instruction
emulator aware of instruction intercepts. Security is more important
then performance.

	Joerg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-26 20:25             ` Joerg Roedel
@ 2010-09-27  8:36               ` Avi Kivity
  2010-09-27 14:18                 ` Gleb Natapov
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2010-09-27  8:36 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Gleb Natapov, Nadav Har'El, Chris Wright, kvm

  On 09/26/2010 10:25 PM, Joerg Roedel wrote:
> >  btw, speaking of drastic changes to nsvm, one thing I'd like to see is
> >  the replacement of those kmaps with something like put_user_try() and
> >  put_user_catch().  It should be as fast (or faster) than kmaps, and not
> >  affect preemptibility.
>
> Yes, I want to get rid of them too. I thought about using
> copy_from/to_user in the vmrun/vmexit path. I need to measure if this
> has any performance impact, though.

copy_to_user() is slow since it is very generic.  put_user() generally 
translates to one instruction (perhaps a range check as well).  We can 
avoid the range check if we do it once for the entire vmcb page.

Gleb had something along those lines, it's useful for kvmclock as well.

> But the vmrun/vmexit path in nested-svm will see some major changes in
> the near future anyway to improve performance and prepare it for
> clean-bits emulation. In this step I will also address the kmap
> problem. But first on the list for me is to make the instruction
> emulator aware of instruction intercepts. Security is more important
> then performance.
>

Amen.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-27  8:36               ` Avi Kivity
@ 2010-09-27 14:18                 ` Gleb Natapov
  2010-09-27 14:22                   ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Gleb Natapov @ 2010-09-27 14:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Joerg Roedel, Nadav Har'El, Chris Wright, kvm

On Mon, Sep 27, 2010 at 10:36:57AM +0200, Avi Kivity wrote:
>  On 09/26/2010 10:25 PM, Joerg Roedel wrote:
> >>  btw, speaking of drastic changes to nsvm, one thing I'd like to see is
> >>  the replacement of those kmaps with something like put_user_try() and
> >>  put_user_catch().  It should be as fast (or faster) than kmaps, and not
> >>  affect preemptibility.
> >
> >Yes, I want to get rid of them too. I thought about using
> >copy_from/to_user in the vmrun/vmexit path. I need to measure if this
> >has any performance impact, though.
> 
> copy_to_user() is slow since it is very generic.  put_user()
> generally translates to one instruction (perhaps a range check as
> well).  We can avoid the range check if we do it once for the entire
> vmcb page.
> 
> Gleb had something along those lines, it's useful for kvmclock as well.
> 
Well, since you asked to make it generic it uses copy_to_user() now :)
It tracks slot version so gfn_to_hva() translation is omitted most of
the times.

--
			Gleb.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: KVM call minutes for Sept 21
  2010-09-27 14:18                 ` Gleb Natapov
@ 2010-09-27 14:22                   ` Avi Kivity
  0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2010-09-27 14:22 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Joerg Roedel, Nadav Har'El, Chris Wright, kvm

  On 09/27/2010 04:18 PM, Gleb Natapov wrote:
> On Mon, Sep 27, 2010 at 10:36:57AM +0200, Avi Kivity wrote:
> >   On 09/26/2010 10:25 PM, Joerg Roedel wrote:
> >  >>   btw, speaking of drastic changes to nsvm, one thing I'd like to see is
> >  >>   the replacement of those kmaps with something like put_user_try() and
> >  >>   put_user_catch().  It should be as fast (or faster) than kmaps, and not
> >  >>   affect preemptibility.
> >  >
> >  >Yes, I want to get rid of them too. I thought about using
> >  >copy_from/to_user in the vmrun/vmexit path. I need to measure if this
> >  >has any performance impact, though.
> >
> >  copy_to_user() is slow since it is very generic.  put_user()
> >  generally translates to one instruction (perhaps a range check as
> >  well).  We can avoid the range check if we do it once for the entire
> >  vmcb page.
> >
> >  Gleb had something along those lines, it's useful for kvmclock as well.
> >
> Well, since you asked to make it generic it uses copy_to_user() now :)
> It tracks slot version so gfn_to_hva() translation is omitted most of
> the times.

Sure, that's easy to optimize later on.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-09-27 14:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-21 18:05 KVM call minutes for Sept 21 Chris Wright
2010-09-21 18:23 ` Anthony Liguori
2010-09-22  0:04 ` Nadav Har'El
2010-09-22  1:48   ` Chris Wright
2010-09-22 17:49     ` Nadav Har'El
2010-09-22 18:03       ` Anthony Liguori
2010-09-22 19:34         ` Joerg Roedel
2010-09-22 19:48       ` Joerg Roedel
2010-09-22  9:02   ` Gleb Natapov
2010-09-22 16:29     ` Nadav Har'El
2010-09-22 17:47       ` Gleb Natapov
2010-09-22 19:20         ` Joerg Roedel
2010-09-22 20:18           ` Gleb Natapov
2010-09-22 23:00             ` Nadav Har'El
2010-09-26 14:03           ` Avi Kivity
2010-09-26 20:25             ` Joerg Roedel
2010-09-27  8:36               ` Avi Kivity
2010-09-27 14:18                 ` Gleb Natapov
2010-09-27 14:22                   ` Avi Kivity
2010-09-26 13:27   ` Avi Kivity
2010-09-26 14:28     ` Nadav Har'El
2010-09-26 14:50       ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox