From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for
 PV guests
Date: Wed, 25 Sep 2013 15:40:13 +0100
Message-ID: <5242F5CD.3000804@citrix.com>
References: <1379670132-1748-1-git-send-email-boris.ostrovsky@oracle.com>
	<1379670132-1748-12-git-send-email-boris.ostrovsky@oracle.com>
	<5243106702000078000F6524@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <Andrew.Cooper3@citrix.com>) id 1VOqGI-0004iO-17
	for xen-devel@lists.xenproject.org; Wed, 25 Sep 2013 14:40:18 +0000
In-Reply-To: <5243106702000078000F6524@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: jun.nakajima@intel.com, George.Dunlap@eu.citrix.com, jacob.shin@amd.com, eddie.dong@intel.com, dietmar.hahn@ts.fujitsu.com, suravee.suthikulpanit@amd.com, xen-devel <xen-devel@lists.xenproject.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 25/09/13 15:33, Jan Beulich wrote:
>>>> On 20.09.13 at 11:42, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> Add support for handling PMU interrupts for PV guests, make these interrupts
>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the
>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0).
> Is using NMIs here a necessity? I guess not, in which case I'd really
> like this to be a (perhaps even non-default) option controllable via
> command line option.
>
>> - * This interrupt handles performance counters interrupt
>> - */
>> -
>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>> -{
>> -    ack_APIC_irq();
>> -    vpmu_do_interrupt(regs);
>> -}
> So this was the only caller of vpmu_do_interrupt(); no new one gets
> added in this patch afaics, and I don't recall having seen addition of
> another caller in earlier patches. What's the deal?
>
>> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>  {
>>      struct vcpu *v = current;
>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>> +    struct vpmu_struct *vpmu;
>>  
>> -    if ( vpmu->arch_vpmu_ops )
>> +
>> +    /* dom0 will handle this interrupt */
>> +    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
>> +        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
>> +    {
>> +            if ( smp_processor_id() >= dom0->max_vcpus )
>> +                return 0;
>> +            v = dom0->vcpu[smp_processor_id()];
>> +    }
>> +
>> +    vpmu = vcpu_vpmu(v);
>> +    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>> +        return 0;
>> +
>> +    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
>> +    {
>> +        /* PV guest or dom0 is doing system profiling */
>> +        void *p;
>> +        struct cpu_user_regs *gregs;
>> +
>> +        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
>> +
>> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
>> +        vpmu_save_force(v);
>> +
>> +        /* Store appropriate registers in xenpmu_data
>> +         *
>> +         * Note: '!current->is_running' is possible when 'set_current(next)'
>> +         * for the (HVM) guest has been called but 'reset_stack_and_jump()'
>> +         * has not (i.e. the guest is not actually running yet).
>> +         */
>> +        if ( !is_hvm_domain(current->domain) ||
>> +             ((vpmu_mode & XENPMU_MODE_PRIV) && !current->is_running) )
>> +        {
>> +            /*
>> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
>> +             * and therefore we treat it the same way as a non-priviledged
>> +             * PV 32-bit domain.
>> +             */
>> +            if ( is_pv_32bit_domain(current->domain) )
>> +            {
>> +                struct compat_cpu_user_regs cmp;
>> +
>> +                gregs = guest_cpu_user_regs();
>> +                XLAT_cpu_user_regs(&cmp, gregs);
>> +                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
>> +            }
>> +            else if ( (current->domain != dom0) && !is_idle_vcpu(current) &&
>> +                !(vpmu_mode & XENPMU_MODE_PRIV) )
>> +            {
>> +                /* PV guest */
>> +                gregs = guest_cpu_user_regs();
>> +                memcpy(p, gregs, sizeof(struct cpu_user_regs));
>> +            }
>> +            else
>> +                memcpy(p, regs, sizeof(struct cpu_user_regs));
>> +        }
>> +        else
>> +        {
>> +            /* HVM guest */
>> +            struct segment_register cs;
>> +
>> +            gregs = guest_cpu_user_regs();
>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>> +            gregs->cs = cs.attr.fields.dpl;
>> +
>> +            memcpy(p, gregs, sizeof(struct cpu_user_regs));
>> +        }
>> +
>> +        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
>> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
>> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
>> +
>> +        raise_softirq(PMU_SOFTIRQ);
>> +        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
>> +
>> +        return 1;
>> +    }
>> +    else  if ( vpmu->arch_vpmu_ops )
>>      {
>> -        struct vlapic *vlapic = vcpu_vlapic(v);
>> +        /* HVM guest */
>> +        struct vlapic *vlapic;
>>          u32 vlapic_lvtpc;
>>          unsigned char int_vec;
>>  
>>          if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>>              return 0;
>>  
>> +        vlapic = vcpu_vlapic(v);
>>          if ( !is_vlapic_lvtpc_enabled(vlapic) )
>>              return 1;
>>  
> Assuming the plan is to run this in NMI context - this is _a lot_ of
> stuff you want to do. Did you carefully audit all paths for being
> NMI-safe?
>
> Jan

vpmu_save() is not safe from an NMI context, as its non-NMI context uses
local_irq_disable() to achieve consistency.

~Andrew