From mboxrd@z Thu Jan 1 00:00:00 1970 From: Suravee Suthikulanit Subject: Re: x86/AMD: Nested VM failed to boot L2 guest due to setting/clearing CR0.CD bit Date: Tue, 6 Aug 2013 12:55:55 -0500 Message-ID: <520138AB.8040600@amd.com> References: <52005F0C.4070409@amd.com> <5200BE0702000078000E9862@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0199901890324450005==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1V6lUM-0001E4-Pn for xen-devel@lists.xenproject.org; Tue, 06 Aug 2013 17:56:07 +0000 In-Reply-To: <5200BE0702000078000E9862@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , Christoph Egger , Jun Nakajima List-Id: xen-devel@lists.xenproject.org --===============0199901890324450005== Content-Type: multipart/alternative; boundary="------------030107000109030300000206" --------------030107000109030300000206 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit On 8/6/2013 2:12 AM, Jan Beulich wrote: >>>> On 06.08.13 at 04:27, Suravee Suthikulanit > wrote: >> Hi All, >> >> While I was testing nested VM on with latest Xen on AMD system, I am running >> into issue where >> the L2 guest (Linux) seems to stuck right after loading the kernel. When >> using the "xl debug-keys d" to dump registers, >> the L2 guest RIP always at the instruction which tries to write the CR0.CD >> bit. Besides, once starting L2 guest and it >> got stuck, L0 Dom0 becomes very slow until I kill the L2 guest. >> >> After looking into the hvm code for handling CR0 (i.e. >> xen/arch/x86/hvm/hvm.c: hvm_set_cr0()), >> I see that the code tries to issue local cache flush on all the cores when >> the L2 guest is >> setting the CR0.CD bit. (Please see the code snippet below.) >> >> if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) ) >> { >> /* Entering no fill cache mode. */ >> spin_lock(&v->domain->arch.hvm_domain.uc_lock); >> v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE; >> >> if ( !v->domain->arch.hvm_domain.is_in_uc_mode ) >> { >> /* Flush physical caches. */ >> ---> HERE on_each_cpu(local_flush_cache, NULL, 1); >> hvm_set_uc_mode(v, 1); >> } >> spin_unlock(&v->domain->arch.hvm_domain.uc_lock); >> } >> >> When I try to comment out the line, the issue goes away. Is this line >> necessary? >> Why do we need to flush all the cpu cores when the CR0.CD bit only applies >> to a particular core? > Doing the flush only on the local CPU would imply that once the > affected vCPU migrates to another pCPU, flushing would _then_ > need to be done there too. Tracking this would clearly add > complexity here. > > Furthermore, the "UC mode" is being entered on the domain as a > whole, i.e. all the pCPU-s that the domain is actively running one > would need immediate flushing, and all pCPU-s any of the vCPU-s > would migrate to subsequently would need deferred > flushing. > > That said, I still can't see how the flushing here would have this > dramatic an effect: It's a one-time thing, when UC mode first gets > entered by a domain. So unless CR0.CD gets flipped back and > forth by a guest, there shouldn't be more than one flush (or there's > a logic error somewhere else). > > Finally, the need for that code as a whole is under question in the > context of XSA-60. I would certainly favor (at least on the SVM > side) to handle CR0.CD per vCPU instead of per domain, as long > as there are no requirements that CR0.CD be set consistently > across multiple CPUs (e.g. within a package; on Intel CPUs I'm > being told it's a hard requirement to be consistent at least > between sibling hyperthreads, meaning that we can't rip out the > current logic altogether in favor of a CR0.CD based solution). > > Jan > > Somehow the problem went away when Iupdate the hypervisor in both L0 and L1, and I can no longer reproduce the issue. At one point when I was trying to debug the issue using "hvm_debug", I was seeing the messages where the CD bit was flipped back and forth. (XEN) [HVM:1.3] Update CR0 value = 8005003b (XEN) [HVM:1.3] Update CR0 value = c005003b (XEN) [HVM:1.3] Update CR0 value = 8005003b (XEN) [HVM:1.3] Update CR0 value = c005003b Thanks for details. I'll keep monitoring this in the future. Suravee --------------030107000109030300000206 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit
On 8/6/2013 2:12 AM, Jan Beulich wrote:
On 06.08.13 at 04:27, Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
wrote:
Hi All,

While I was testing nested VM on with latest Xen on AMD system, I am running 
into issue where
the L2 guest (Linux) seems to stuck right after loading the kernel. When 
using the "xl debug-keys d" to dump registers,
the L2 guest RIP always at the instruction which tries to write the CR0.CD 
bit.  Besides, once starting L2 guest and it
got stuck, L0 Dom0 becomes very slow until I kill the L2 guest.

After looking into the hvm code for handling CR0 (i.e. 
xen/arch/x86/hvm/hvm.c: hvm_set_cr0()),
I see that the code tries to issue local cache flush on all the cores when 
the L2 guest is
setting the CR0.CD bit. (Please see the code snippet below.)

         if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) )
         {
             /* Entering no fill cache mode. */
             spin_lock(&v->domain->arch.hvm_domain.uc_lock);
             v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE;

             if ( !v->domain->arch.hvm_domain.is_in_uc_mode )
             {
                 /* Flush physical caches. */
---> HERE       on_each_cpu(local_flush_cache, NULL, 1);
                 hvm_set_uc_mode(v, 1);
             }
             spin_unlock(&v->domain->arch.hvm_domain.uc_lock);
         }

When I try to comment out the line, the issue goes away.  Is this line 
necessary?
Why do we need to flush all the cpu cores when the CR0.CD bit only applies
to a particular core?
Doing the flush only on the local CPU would imply that once the
affected vCPU migrates to another pCPU, flushing would _then_
need to be done there too. Tracking this would clearly add
complexity here.

Furthermore, the "UC mode" is being entered on the domain as a
whole, i.e. all the pCPU-s that the domain is actively running one
would need immediate flushing, and all pCPU-s any of the vCPU-s
would migrate to subsequently would need deferred
flushing.

That said, I still can't see how the flushing here would have this
dramatic an effect: It's a one-time thing, when UC mode first gets
entered by a domain. So unless CR0.CD gets flipped back and
forth by a guest, there shouldn't be more than one flush (or there's
a logic error somewhere else).

Finally, the need for that code as a whole is under question in the
context of XSA-60. I would certainly favor (at least on the SVM
side) to handle CR0.CD per vCPU instead of per domain, as long
as there are no requirements that CR0.CD be set consistently
across multiple CPUs (e.g. within a package; on Intel CPUs I'm
being told it's a hard requirement to be consistent at least
between sibling hyperthreads, meaning that we can't rip out the
current logic altogether in favor of a CR0.CD based solution).

Jan


Somehow the problem went away when I update the hypervisor in both L0
and
L1, and I can no longer reproduce the issue. At one point when I was
trying to debug the issue using "hvm_debug", I was seeing the messages where the CD bit was flipped
back and forth.

(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b

Thanks for details. I'll keep monitoring this in the future.

Suravee

--------------030107000109030300000206-- --===============0199901890324450005== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0199901890324450005==--