From mboxrd@z Thu Jan  1 00:00:00 1970
From: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
Subject: Re: x86/AMD: Nested VM failed to boot L2 guest due to
 setting/clearing CR0.CD bit
Date: Tue, 6 Aug 2013 12:55:55 -0500
Message-ID: <520138AB.8040600@amd.com>
References: <52005F0C.4070409@amd.com>
	<5200BE0702000078000E9862@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0199901890324450005=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <Suravee.Suthikulpanit@amd.com>) id 1V6lUM-0001E4-Pn
	for xen-devel@lists.xenproject.org; Tue, 06 Aug 2013 17:56:07 +0000
In-Reply-To: <5200BE0702000078000E9862@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>, Christoph Egger <chegger@amazon.de>, Jun Nakajima <jun.nakajima@intel.com>
List-Id: xen-devel@lists.xenproject.org

--===============0199901890324450005==
Content-Type: multipart/alternative;
	boundary="------------030107000109030300000206"

--------------030107000109030300000206
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

On 8/6/2013 2:12 AM, Jan Beulich wrote:
>>>> On 06.08.13 at 04:27, Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
> wrote:
>> Hi All,
>>
>> While I was testing nested VM on with latest Xen on AMD system, I am running
>> into issue where
>> the L2 guest (Linux) seems to stuck right after loading the kernel. When
>> using the "xl debug-keys d" to dump registers,
>> the L2 guest RIP always at the instruction which tries to write the CR0.CD
>> bit.  Besides, once starting L2 guest and it
>> got stuck, L0 Dom0 becomes very slow until I kill the L2 guest.
>>
>> After looking into the hvm code for handling CR0 (i.e.
>> xen/arch/x86/hvm/hvm.c: hvm_set_cr0()),
>> I see that the code tries to issue local cache flush on all the cores when
>> the L2 guest is
>> setting the CR0.CD bit. (Please see the code snippet below.)
>>
>>           if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) )
>>           {
>>               /* Entering no fill cache mode. */
>>               spin_lock(&v->domain->arch.hvm_domain.uc_lock);
>>               v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE;
>>
>>               if ( !v->domain->arch.hvm_domain.is_in_uc_mode )
>>               {
>>                   /* Flush physical caches. */
>> ---> HERE       on_each_cpu(local_flush_cache, NULL, 1);
>>                   hvm_set_uc_mode(v, 1);
>>               }
>>               spin_unlock(&v->domain->arch.hvm_domain.uc_lock);
>>           }
>>
>> When I try to comment out the line, the issue goes away.  Is this line
>> necessary?
>> Why do we need to flush all the cpu cores when the CR0.CD bit only applies
>> to a particular core?
> Doing the flush only on the local CPU would imply that once the
> affected vCPU migrates to another pCPU, flushing would _then_
> need to be done there too. Tracking this would clearly add
> complexity here.
>
> Furthermore, the "UC mode" is being entered on the domain as a
> whole, i.e. all the pCPU-s that the domain is actively running one
> would need immediate flushing, and all pCPU-s any of the vCPU-s
> would migrate to subsequently would need deferred
> flushing.
>
> That said, I still can't see how the flushing here would have this
> dramatic an effect: It's a one-time thing, when UC mode first gets
> entered by a domain. So unless CR0.CD gets flipped back and
> forth by a guest, there shouldn't be more than one flush (or there's
> a logic error somewhere else).
>
> Finally, the need for that code as a whole is under question in the
> context of XSA-60. I would certainly favor (at least on the SVM
> side) to handle CR0.CD per vCPU instead of per domain, as long
> as there are no requirements that CR0.CD be set consistently
> across multiple CPUs (e.g. within a package; on Intel CPUs I'm
> being told it's a hard requirement to be consistent at least
> between sibling hyperthreads, meaning that we can't rip out the
> current logic altogether in favor of a CR0.CD based solution).
>
> Jan
>
>
Somehow the problem went away when Iupdate the hypervisor in both L0
and L1, and I can no longer reproduce the issue. At one point when I was
trying to debug the issue using "hvm_debug", I was seeing the messages 
where the CD bit was flipped
back and forth.

(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b

Thanks for details. I'll keep monitoring this in the future.

Suravee

--------------030107000109030300000206
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 8/6/2013 2:12 AM, Jan Beulich wrote:<br>
    </div>
    <blockquote cite="mid:5200BE0702000078000E9862@nat28.tlf.novell.com"
      type="cite">
      <blockquote type="cite">
        <blockquote type="cite">
          <blockquote type="cite">
            <pre wrap="">On 06.08.13 at 04:27, Suravee Suthikulanit <a class="moz-txt-link-rfc2396E" href="mailto:suravee.suthikulpanit@amd.com">&lt;suravee.suthikulpanit@amd.com&gt;</a>
</pre>
          </blockquote>
        </blockquote>
      </blockquote>
      <pre wrap="">wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">Hi All,

While I was testing nested VM on with latest Xen on AMD system, I am running 
into issue where
the L2 guest (Linux) seems to stuck right after loading the kernel. When 
using the "xl debug-keys d" to dump registers,
the L2 guest RIP always at the instruction which tries to write the CR0.CD 
bit.  Besides, once starting L2 guest and it
got stuck, L0 Dom0 becomes very slow until I kill the L2 guest.

After looking into the hvm code for handling CR0 (i.e. 
xen/arch/x86/hvm/hvm.c: hvm_set_cr0()),
I see that the code tries to issue local cache flush on all the cores when 
the L2 guest is
setting the CR0.CD bit. (Please see the code snippet below.)

         if ( (value &amp; X86_CR0_CD) &amp;&amp; !(value &amp; X86_CR0_NW) )
         {
             /* Entering no fill cache mode. */
             spin_lock(&amp;v-&gt;domain-&gt;arch.hvm_domain.uc_lock);
             v-&gt;arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE;

             if ( !v-&gt;domain-&gt;arch.hvm_domain.is_in_uc_mode )
             {
                 /* Flush physical caches. */
---&gt; HERE       on_each_cpu(local_flush_cache, NULL, 1);
                 hvm_set_uc_mode(v, 1);
             }
             spin_unlock(&amp;v-&gt;domain-&gt;arch.hvm_domain.uc_lock);
         }

When I try to comment out the line, the issue goes away.  Is this line 
necessary?
Why do we need to flush all the cpu cores when the CR0.CD bit only applies
to a particular core?
</pre>
      </blockquote>
      <pre wrap="">
Doing the flush only on the local CPU would imply that once the
affected vCPU migrates to another pCPU, flushing would _then_
need to be done there too. Tracking this would clearly add
complexity here.

Furthermore, the "UC mode" is being entered on the domain as a
whole, i.e. all the pCPU-s that the domain is actively running one
would need immediate flushing, and all pCPU-s any of the vCPU-s
would migrate to subsequently would need deferred
flushing.

That said, I still can't see how the flushing here would have this
dramatic an effect: It's a one-time thing, when UC mode first gets
entered by a domain. So unless CR0.CD gets flipped back and
forth by a guest, there shouldn't be more than one flush (or there's
a logic error somewhere else).

Finally, the need for that code as a whole is under question in the
context of XSA-60. I would certainly favor (at least on the SVM
side) to handle CR0.CD per vCPU instead of per domain, as long
as there are no requirements that CR0.CD be set consistently
across multiple CPUs (e.g. within a package; on Intel CPUs I'm
being told it's a hard requirement to be consistent at least
between sibling hyperthreads, meaning that we can't rip out the
current logic altogether in favor of a CR0.CD based solution).

Jan


</pre>
    </blockquote>
    <font size="-1"><big>Some</big><font size="-1"><big>how the problem
          went away when I</big><font size="-1"><big> </big><font
            size="-1"><big>update the hypervi</big><font size="-1"><big>sor
                in both L0 <br>
                and </big><font size="-1"><big>L1, and I can no longer
                </big><font size="-1"><big>reproduce the issue</big><font
                    size="-1"><big>. At one point when I was<br>
                      <font size="-1">trying to debug the issue using
                        "hvm_debug", I was seeing the <font size="-1">messag<font
                            size="-1">es where the CD bit was flipped<br>
                            <font size="-1">back and forth.</font><br>
                          </font><br>
                          <font size="-1">(XEN) [HVM:1.3]
                            &lt;hvm_set_cr0&gt; Update CR0 value =
                            8005003b<br>
                            (XEN) [HVM:1.3] &lt;hvm_set_cr0&gt; Update
                            CR0 value = <font size="-1">c</font>005003b<br>
                            (XEN) [HVM:1.3] &lt;hvm_set_cr0&gt; Update
                            CR0 value = 8005003b<br>
                            (XEN) [HVM:1.3] &lt;hvm_set_cr0&gt; Update
                            CR0 value = <font size="-1">c</font>005003b<br>
                          </font><br>
                          Thank<font size="-1">s for detail<font
                              size="-1">s. I'll keep monitoring this in
                              the future.<br>
                              <br>
                              <font size="-1">Suravee</font></font></font></font></font></big></font></font></font></font></font></font></font></font><font
      size="-1"><font size="-1"><font size="-1"><font size="-1"><font
              size="-1"><font size="-1"><font size="-1"><font size="-1"><big><font
                        size="-1"><font size="-1"><font size="-1"><font
                              size="-1"><br>
                            </font></font></font></font></big></font></font></font></font></font></font></font></font>
  </body>
</html>

--------------030107000109030300000206--


--===============0199901890324450005==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============0199901890324450005==--