From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v11 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests Date: Tue, 23 Sep 2014 14:57:45 -0400 Message-ID: <5421C2A9.6010701@oracle.com> References: <1411430281-6132-1-git-send-email-boris.ostrovsky@oracle.com> <1411430281-6132-17-git-send-email-boris.ostrovsky@oracle.com> <20140923183139.GR3007@laptop.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140923183139.GR3007@laptop.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: kevin.tian@intel.com, keir@xen.org, jbeulich@suse.com, jun.nakajima@intel.com, andrew.cooper3@citrix.com, tim@xen.org, dietmar.hahn@ts.fujitsu.com, xen-devel@lists.xen.org, Aravind.Gopalakrishnan@amd.com, suravee.suthikulpanit@amd.com List-Id: xen-devel@lists.xenproject.org On 09/23/2014 02:31 PM, Konrad Rzeszutek Wilk wrote: >> int vpmu_do_interrupt(struct cpu_user_regs *regs) >> { >> - struct vcpu *v = current; >> - struct vpmu_struct *vpmu = vcpu_vpmu(v); >> + struct vcpu *sampled = current, *sampling; >> + struct vpmu_struct *vpmu; >> + >> + /* dom0 will handle interrupt for special domains (e.g. idle domain) */ >> + if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED ) >> + { >> + sampling = choose_hwdom_vcpu(); >> + if ( !sampling ) >> + return 0; >> + } >> + else >> + sampling = sampled; >> + >> + vpmu = vcpu_vpmu(sampling); >> + if ( !is_hvm_domain(sampling->domain) ) >> + { >> + /* PV(H) guest */ >> + const struct cpu_user_regs *cur_regs; >> + >> + if ( !vpmu->xenpmu_data ) >> + return 0; >> + >> + if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED ) >> + return 1; >> + >> + if ( is_pvh_domain(sampled->domain) && >> + !vpmu->arch_vpmu_ops->do_interrupt(regs) ) >> + return 0; >> + >> + /* PV guest will be reading PMU MSRs from xenpmu_data */ > OK, with that nice comment I was thinking that: >> + vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); >> + vpmu->arch_vpmu_ops->arch_vpmu_save(sampling); > would be the one that writes the values to pmu.r.regs, but so far > in this patchset that is not the case. It does write to the pmu.c (contexts) > but that is not .. > >> + vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); >> + >> + /* Store appropriate registers in xenpmu_data */ >> + if ( is_pv_32bit_domain(sampling->domain) ) >> + { >> + /* >> + * 32-bit dom0 cannot process Xen's addresses (which are 64 bit) >> + * and therefore we treat it the same way as a non-privileged >> + * PV 32-bit domain. >> + */ >> + struct compat_pmu_regs *cmp; >> + >> + cur_regs = guest_cpu_user_regs(); >> + >> + cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs; > .. what we read from here. Did I miss a patch? > >> + cmp->eip = cur_regs->rip; >> + cmp->esp = cur_regs->rsp; >> + cmp->cs = cur_regs->cs; >> + if ( (cmp->cs & 3) == 1 ) >> + cmp->cs &= ~3; >> + } >> + else >> + { >> + struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs; > Ditto here. > > Perhaps a comment stating _who_ (or _what_) is responsible for populating > the 'pmu.r.regs' structure? There are different registers --- PMU MSRs and GPRs. Comment "PV guest will be reading PMU MSRs from xenpmu_data" was in reference to guest accessing MSRs that are cached in pmu.c(ontext) as opposed to doing rd/wrmsr and trapping to hypervisor.. That's why we do arch_vpmu_save(): to save PMU MSRs into the shared page. pmu.r(egisters) are sampled VCPU's GPRs and are filled in this routine. Does this answer your question? > >> + >> + /* Non-privileged domains are always in XENPMU_MODE_SELF mode */ >> + if ( (vpmu_mode & XENPMU_MODE_SELF) || >> + (!is_hardware_domain(sampled->domain) && >> + !is_idle_vcpu(sampled)) ) >> + cur_regs = guest_cpu_user_regs(); >> + else >> + cur_regs = regs; >> + >> + r->rip = cur_regs->rip; >> + r->rsp = cur_regs->rsp; >> + >> + if ( !is_pvh_domain(sampled->domain) ) >> + { >> + r->cs = cur_regs->cs; >> + if ( sampled->arch.flags & TF_kernel_mode ) >> + r->cs &= ~3; >> + } >> + else >> + { >> + struct segment_register seg_cs; >> + >> + hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs); >> + r->cs = seg_cs.sel; >> + } >> + } >> + >> + vpmu->xenpmu_data->domain_id = DOMID_SELF; >> + vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id; >> + vpmu->xenpmu_data->pcpu_id = smp_processor_id(); >> + >> + vpmu->xenpmu_data->pmu_flags |= PMU_CACHED; >> + vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED; > Right, especially as core2_vpmu_do_interrupt (which is called earlier) > does: > > vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED; > apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc); > > So here we mask it > >> + apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); > Hm, so for Intel we would do _two_ apic_writes ? Yes, this is due to Intel's masking of LVTPC during interrupt (AMD does not do this). So the Intel's handler will re-enable PMU interrupts. I could have made Intel's apic_write conditional upon PV(H) vs. HVM but I thought that do_interrupt() should always return an unmasked PMU in generic code so that both Intel and AMD "look" the same. I'll ignore the rest of your comments since I think that's what you told me to ;-) -boris > > Anyhow, we mask it here.. > .. snip.. >> case XENPMU_lvtpc_set: >> - if ( current->arch.vpmu.xenpmu_data == NULL ) >> + curr = current; >> + if ( curr->arch.vpmu.xenpmu_data == NULL ) >> return -EINVAL; >> - vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc); >> + vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc); >> + ret = 0; >> + break; >> + >> + case XENPMU_flush: >> + curr = current; >> + curr->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED; >> + vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc); > And this code does: > > +void vpmu_lvtpc_update(uint32_t val) > +{ > + struct vpmu_struct *vpmu = vcpu_vpmu(current); > + > + vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED); > + apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); > +} > + > > And we over-write vpmu->hw_lapic_lvtpc with the new value. The 'val' > here is important as it may have APIC_LVT_MASKED or not. We get that > from ' curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtp ' but I am > not seeing who sets it. I was thinking the XENPMU_lvtpc_set but that > calls vpmu_lvtpc_update which will update the vpmu->hw_lapic_lvtpc > with the APIC_LVT_MASKED if set. > > So who sets it? > > Anyhow, with it being zero, that means the APIC_LVT_MASKED is unset, so > interrupts are free to go, and here: > >> + vpmu_load(curr); > We do another 'apic_write_around' with the hw_lapic_lvtpc value. So two > apci_writes to unmask it. (Or mask it back if the l.lapic_lvtpc has > APIC_LVT_MASKED set). > > Should the XENPMU_flush check if: > > - The PMU_CACHED bit is set (and if not then -EAGAIN?) > - The APIC_LVT_MASKED was set (and if so, then unmaks it). If it is > not set, then no need to do apic_write? > > ? >> ret = 0; >> break; >> } >> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h >> index 68a5fb8..a1886a5 100644 >> --- a/xen/include/public/pmu.h >> +++ b/xen/include/public/pmu.h >> @@ -28,6 +28,7 @@ >> #define XENPMU_init 4 >> #define XENPMU_finish 5 >> #define XENPMU_lvtpc_set 6 >> +#define XENPMU_flush 7 /* Write cached MSR values to HW */ >> /* ` } */ >> >> /* Parameters structure for HYPERVISOR_xenpmu_op call */ >> @@ -61,6 +62,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t); >> */ >> #define XENPMU_FEATURE_INTEL_BTS 1 >> >> +/* >> + * PMU MSRs are cached in the context so the PV guest doesn't need to trap to >> + * the hypervisor >> + */ >> +#define PMU_CACHED 1 >> + >> /* Shared between hypervisor and PV domain */ >> struct xen_pmu_data { >> uint32_t domain_id; >> -- >> 1.8.1.4 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel