public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Zachary Amsden <zamsden@redhat.com>
To: Alexander Graf <agraf@suse.de>
Cc: "Nadav Har'El" <nyh@math.technion.ac.il>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"joro@8bytes.org" <joro@8bytes.org>, Avi Kivity <avi@redhat.com>
Subject: Re: TSC in nested SVM and VMX
Date: Sat, 02 Oct 2010 12:46:49 -1000	[thread overview]
Message-ID: <4CA7B659.6070808@redhat.com> (raw)
In-Reply-To: <E784E4D6-238B-4D96-9940-705676B46724@suse.de>

On 10/02/2010 01:19 AM, Alexander Graf wrote:
> On 02.10.2010, at 03:56, Alexander Graf wrote:
>
>    
>> Am 01.10.2010 um 21:22 schrieb Zachary Amsden<zamsden@redhat.com>:
>>
>>      
>>> On 10/01/2010 04:46 AM, Alexander Graf wrote:
>>>        
>>>> On 01.10.2010, at 13:21, Nadav Har'El wrote:
>>>>
>>>>
>>>>          
>>>>> On Thu, Sep 30, 2010, Zachary Amsden wrote about "Re: TSC in nested SVM and VMX":
>>>>>
>>>>>            
>>>>>> 1)  When reading an MSR, we are not emulating the L2 guest; we are
>>>>>> DIRECTLY reading the MSR for the L1 emulation.  Any emulation of the L2
>>>>>> guest is actually done by the code running /inside/ the L1 emulation, so
>>>>>> MSR reads for the L2 guest are handed by L1, and MSR reads for the L1
>>>>>> guest are handled by L0, which is this code.
>>>>>> ...
>>>>>> So if we are currently running nested, the L1 tsc_offset is stored in
>>>>>> the nested.hsave field; the vmcb which is active is polluted by the L2
>>>>>> guest offset, which would be incorrect to return to the L1 emulation.
>>>>>>
>>>>>>              
>>>>> Thanks for the detailed explanation.
>>>>>
>>>>> It seems, then, that the nested VMX logic is somewhat different from that
>>>>> of the nested SVM. In nested VMX, if a function gets called when running
>>>>> L1, the current VMCS will be that of L1 (aka vmcs01), not of its guest L2
>>>>> (and I'm not even sure *which* L2 that would be when there are multiple
>>>>> L2 guests for the one L1).
>>>>>
>>>>>            
>>>> If the #vmexit comes while you're in L1, everything works on the L1's vmcb. If you hit it while in L2, everything works on the L2's vmcb unless special attention is taken.
>>>>
>>>> The reason behind the TSC shift is very simple. With the tsc_offset setting we're trying to adjust the L1's offset. Adjusting the L1's offset means we need to adjust L1 and L2 alike, as the virtual L2's offset == L1 offset + vmcb L2 offset, because L2's TSC is also offset by the amount L1 is.
>>>>
>>>> So basically what happens is:
>>>>
>>>> nested VMRUN:
>>>>
>>>>         svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
>>>>
>>>> please note the +=!
>>>>
>>>>
>>>> svm_write_tsc_offset:
>>>>
>>>> This gets called when we really want to current level's TSC offset only because the guest issued a tsc write. In L2 this means the L2's value.
>>>>
>>>>         if (is_nested(svm)) {
>>>>                 g_tsc_offset = svm->vmcb->control.tsc_offset -
>>>>                                svm->nested.hsave->control.tsc_offset;
>>>>
>>>> Remember the difference between L1 and L2.
>>>>
>>>>                 svm->nested.hsave->control.tsc_offset = offset;
>>>>
>>>> Set L1 to the new offset
>>>>
>>>>         }
>>>>
>>>>         svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
>>>>
>>>> Set L2 to new offset + delta.
>>>>
>>>>
>>>> So what this function does is that it treats TSC writes as L1 writes even while in L2 and adjusts L2 accordingly. Joerg, this sounds fishy to me. Are you sure this is intended and works when L1 doesn't intercept MSR writes to TSC?
>>>>
>>>>          
>>> L1 must intercept MSR writes to TSC for this to work.  It does, so all is well.
>>>        
>> Sure, in nested kvm all is fine because we becer
>>      
> never
>
>    
>> hit the above code path. But other nypervisors
>>      

We do hit that code path, and it works fine because it is correct.  It 
only applies to L1 TSC writes.

An L2 guest writing to TSC MSR will not run this code path, an L2 guest 
writing to TSC will (assuming the L1 guest traps writes) trigger a 
#VMEXIT which should be forwarded to the L1 guest.  In response, the L1 
guest has two choices:

1) adjust the TSC offset in the vmcb for the L2 guest.
2) rewrite the TSC instead, triggering the above code path, which 
follows the standard case (as it is not running nested), adjusting the 
TSC offset for the L1 guest only.

In both cases, this adjusted offset will be added to the L2 guest offset 
when it resumes in the nested #VMRUN.

The only time the above code path follows the nested case is

1) A nested L2 guest is running
2) The L0 emulation of L1 requires adjusting the hardware TSC offset 
because of a hardware CPU TSC change

> hypervisors
>
>    
>> might not intercept tsc writes which should only be reflected in an l2 tsc offset change, no?
>>      
>

Other hypervisors are irrelevant here.  L1 hypervisor, whatever it may 
be, may or may not intercept TSC writes.  If it does not, it does not 
correctly virtualize L2.

We correctly virtualize the L1 guest's mistake by allowing the L2 guest 
in that case to rewrite the TSC for the L1 guest.  This may cause some 
slight disruption for L1's correct timekeeping...

Zach

  reply	other threads:[~2010-10-02 22:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-30 22:50 TSC in nested SVM and VMX Nadav Har'El
2010-10-01  4:38 ` Zachary Amsden
2010-10-01 11:21   ` Nadav Har'El
2010-10-01 14:46     ` Alexander Graf
2010-10-01 19:22       ` Zachary Amsden
2010-10-02  1:56         ` Alexander Graf
2010-10-02 11:19           ` Alexander Graf
2010-10-02 22:46             ` Zachary Amsden [this message]
2010-10-03  0:01               ` Alexander Graf
2010-10-03  8:35                 ` Nadav Har'El
2010-10-03  9:01                   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CA7B659.6070808@redhat.com \
    --to=zamsden@redhat.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox