From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests Date: Wed, 25 Sep 2013 11:52:22 -0400 Message-ID: <524306B6.4080808@oracle.com> References: <1379670132-1748-1-git-send-email-boris.ostrovsky@oracle.com> <1379670132-1748-12-git-send-email-boris.ostrovsky@oracle.com> <5243106702000078000F6524@nat28.tlf.novell.com> <5242F5CD.3000804@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VOrM7-0006fy-Ot for xen-devel@lists.xenproject.org; Wed, 25 Sep 2013 15:50:24 +0000 In-Reply-To: <5242F5CD.3000804@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: jun.nakajima@intel.com, Jan Beulich , George.Dunlap@eu.citrix.com, jacob.shin@amd.com, eddie.dong@intel.com, dietmar.hahn@ts.fujitsu.com, suravee.suthikulpanit@amd.com, xen-devel List-Id: xen-devel@lists.xenproject.org On 09/25/2013 10:40 AM, Andrew Cooper wrote: > On 25/09/13 15:33, Jan Beulich wrote: >>>>> On 20.09.13 at 11:42, Boris Ostrovsky wrote: >>> Add support for handling PMU interrupts for PV guests, make these interrupts >>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the >>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0). >> Is using NMIs here a necessity? I guess not, in which case I'd really >> like this to be a (perhaps even non-default) option controllable via >> command line option. >> >>> - * This interrupt handles performance counters interrupt >>> - */ >>> - >>> -void pmu_apic_interrupt(struct cpu_user_regs *regs) >>> -{ >>> - ack_APIC_irq(); >>> - vpmu_do_interrupt(regs); >>> -} >> So this was the only caller of vpmu_do_interrupt(); no new one gets >> added in this patch afaics, and I don't recall having seen addition of >> another caller in earlier patches. What's the deal? >> >>> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) >>> int vpmu_do_interrupt(struct cpu_user_regs *regs) >>> { >>> struct vcpu *v = current; >>> - struct vpmu_struct *vpmu = vcpu_vpmu(v); >>> + struct vpmu_struct *vpmu; >>> >>> - if ( vpmu->arch_vpmu_ops ) >>> + >>> + /* dom0 will handle this interrupt */ >>> + if ( (vpmu_mode & XENPMU_MODE_PRIV) || >>> + (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) >>> + { >>> + if ( smp_processor_id() >= dom0->max_vcpus ) >>> + return 0; >>> + v = dom0->vcpu[smp_processor_id()]; >>> + } >>> + >>> + vpmu = vcpu_vpmu(v); >>> + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) >>> + return 0; >>> + >>> + if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) ) >>> + { >>> + /* PV guest or dom0 is doing system profiling */ >>> + void *p; >>> + struct cpu_user_regs *gregs; >>> + >>> + p = &v->arch.vpmu.xenpmu_data->pmu.regs; >>> + >>> + /* PV guest will be reading PMU MSRs from xenpmu_data */ >>> + vpmu_save_force(v); >>> + >>> + /* Store appropriate registers in xenpmu_data >>> + * >>> + * Note: '!current->is_running' is possible when 'set_current(next)' >>> + * for the (HVM) guest has been called but 'reset_stack_and_jump()' >>> + * has not (i.e. the guest is not actually running yet). >>> + */ >>> + if ( !is_hvm_domain(current->domain) || >>> + ((vpmu_mode & XENPMU_MODE_PRIV) && !current->is_running) ) >>> + { >>> + /* >>> + * 32-bit dom0 cannot process Xen's addresses (which are 64 bit) >>> + * and therefore we treat it the same way as a non-priviledged >>> + * PV 32-bit domain. >>> + */ >>> + if ( is_pv_32bit_domain(current->domain) ) >>> + { >>> + struct compat_cpu_user_regs cmp; >>> + >>> + gregs = guest_cpu_user_regs(); >>> + XLAT_cpu_user_regs(&cmp, gregs); >>> + memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs)); >>> + } >>> + else if ( (current->domain != dom0) && !is_idle_vcpu(current) && >>> + !(vpmu_mode & XENPMU_MODE_PRIV) ) >>> + { >>> + /* PV guest */ >>> + gregs = guest_cpu_user_regs(); >>> + memcpy(p, gregs, sizeof(struct cpu_user_regs)); >>> + } >>> + else >>> + memcpy(p, regs, sizeof(struct cpu_user_regs)); >>> + } >>> + else >>> + { >>> + /* HVM guest */ >>> + struct segment_register cs; >>> + >>> + gregs = guest_cpu_user_regs(); >>> + hvm_get_segment_register(current, x86_seg_cs, &cs); >>> + gregs->cs = cs.attr.fields.dpl; >>> + >>> + memcpy(p, gregs, sizeof(struct cpu_user_regs)); >>> + } >>> + >>> + v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id; >>> + v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id; >>> + v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id(); >>> + >>> + raise_softirq(PMU_SOFTIRQ); >>> + vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH); >>> + >>> + return 1; >>> + } >>> + else if ( vpmu->arch_vpmu_ops ) >>> { >>> - struct vlapic *vlapic = vcpu_vlapic(v); >>> + /* HVM guest */ >>> + struct vlapic *vlapic; >>> u32 vlapic_lvtpc; >>> unsigned char int_vec; >>> >>> if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ) >>> return 0; >>> >>> + vlapic = vcpu_vlapic(v); >>> if ( !is_vlapic_lvtpc_enabled(vlapic) ) >>> return 1; >>> >> Assuming the plan is to run this in NMI context - this is _a lot_ of >> stuff you want to do. Did you carefully audit all paths for being >> NMI-safe? >> >> Jan > vpmu_save() is not safe from an NMI context, as its non-NMI context uses > local_irq_disable() to achieve consistency. Sigh... hvm_get_segment_register() also appears to be unsafe. I'll will need to move it somewhere else. -boris