On 8/6/2013 2:12 AM, Jan Beulich wrote:
>>>> On 06.08.13 at 04:27, Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
> wrote:
>> Hi All,
>>
>> While I was testing nested VM on with latest Xen on AMD system, I am running
>> into issue where
>> the L2 guest (Linux) seems to stuck right after loading the kernel. When
>> using the "xl debug-keys d" to dump registers,
>> the L2 guest RIP always at the instruction which tries to write the CR0.CD
>> bit.  Besides, once starting L2 guest and it
>> got stuck, L0 Dom0 becomes very slow until I kill the L2 guest.
>>
>> After looking into the hvm code for handling CR0 (i.e.
>> xen/arch/x86/hvm/hvm.c: hvm_set_cr0()),
>> I see that the code tries to issue local cache flush on all the cores when
>> the L2 guest is
>> setting the CR0.CD bit. (Please see the code snippet below.)
>>
>>           if ( (value & X86_CR0_CD) && !(value & X86_CR0_NW) )
>>           {
>>               /* Entering no fill cache mode. */
>>               spin_lock(&v->domain->arch.hvm_domain.uc_lock);
>>               v->arch.hvm_vcpu.cache_mode = NO_FILL_CACHE_MODE;
>>
>>               if ( !v->domain->arch.hvm_domain.is_in_uc_mode )
>>               {
>>                   /* Flush physical caches. */
>> ---> HERE       on_each_cpu(local_flush_cache, NULL, 1);
>>                   hvm_set_uc_mode(v, 1);
>>               }
>>               spin_unlock(&v->domain->arch.hvm_domain.uc_lock);
>>           }
>>
>> When I try to comment out the line, the issue goes away.  Is this line
>> necessary?
>> Why do we need to flush all the cpu cores when the CR0.CD bit only applies
>> to a particular core?
> Doing the flush only on the local CPU would imply that once the
> affected vCPU migrates to another pCPU, flushing would _then_
> need to be done there too. Tracking this would clearly add
> complexity here.
>
> Furthermore, the "UC mode" is being entered on the domain as a
> whole, i.e. all the pCPU-s that the domain is actively running one
> would need immediate flushing, and all pCPU-s any of the vCPU-s
> would migrate to subsequently would need deferred
> flushing.
>
> That said, I still can't see how the flushing here would have this
> dramatic an effect: It's a one-time thing, when UC mode first gets
> entered by a domain. So unless CR0.CD gets flipped back and
> forth by a guest, there shouldn't be more than one flush (or there's
> a logic error somewhere else).
>
> Finally, the need for that code as a whole is under question in the
> context of XSA-60. I would certainly favor (at least on the SVM
> side) to handle CR0.CD per vCPU instead of per domain, as long
> as there are no requirements that CR0.CD be set consistently
> across multiple CPUs (e.g. within a package; on Intel CPUs I'm
> being told it's a hard requirement to be consistent at least
> between sibling hyperthreads, meaning that we can't rip out the
> current logic altogether in favor of a CR0.CD based solution).
>
> Jan
>
>
Somehow the problem went away when Iupdate the hypervisor in both L0
and L1, and I can no longer reproduce the issue. At one point when I was
trying to debug the issue using "hvm_debug", I was seeing the messages 
where the CD bit was flipped
back and forth.

(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = 8005003b
(XEN) [HVM:1.3] <hvm_set_cr0> Update CR0 value = c005003b

Thanks for details. I'll keep monitoring this in the future.

Suravee