From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for
 PV guests
Date: Wed, 25 Sep 2013 11:52:22 -0400
Message-ID: <524306B6.4080808@oracle.com>
References: <1379670132-1748-1-git-send-email-boris.ostrovsky@oracle.com>
	<1379670132-1748-12-git-send-email-boris.ostrovsky@oracle.com>
	<5243106702000078000F6524@nat28.tlf.novell.com>
	<5242F5CD.3000804@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <boris.ostrovsky@oracle.com>) id 1VOrM7-0006fy-Ot
	for xen-devel@lists.xenproject.org; Wed, 25 Sep 2013 15:50:24 +0000
In-Reply-To: <5242F5CD.3000804@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: jun.nakajima@intel.com, Jan Beulich <JBeulich@suse.com>, George.Dunlap@eu.citrix.com, jacob.shin@amd.com, eddie.dong@intel.com, dietmar.hahn@ts.fujitsu.com, suravee.suthikulpanit@amd.com, xen-devel <xen-devel@lists.xenproject.org>
List-Id: xen-devel@lists.xenproject.org

On 09/25/2013 10:40 AM, Andrew Cooper wrote:
> On 25/09/13 15:33, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> Add support for handling PMU interrupts for PV guests, make these interrupts
>>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the
>>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0).
>> Is using NMIs here a necessity? I guess not, in which case I'd really
>> like this to be a (perhaps even non-default) option controllable via
>> command line option.
>>
>>> - * This interrupt handles performance counters interrupt
>>> - */
>>> -
>>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>>> -{
>>> -    ack_APIC_irq();
>>> -    vpmu_do_interrupt(regs);
>>> -}
>> So this was the only caller of vpmu_do_interrupt(); no new one gets
>> added in this patch afaics, and I don't recall having seen addition of
>> another caller in earlier patches. What's the deal?
>>
>>> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>>>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>>   {
>>>       struct vcpu *v = current;
>>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>> +    struct vpmu_struct *vpmu;
>>>   
>>> -    if ( vpmu->arch_vpmu_ops )
>>> +
>>> +    /* dom0 will handle this interrupt */
>>> +    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
>>> +        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
>>> +    {
>>> +            if ( smp_processor_id() >= dom0->max_vcpus )
>>> +                return 0;
>>> +            v = dom0->vcpu[smp_processor_id()];
>>> +    }
>>> +
>>> +    vpmu = vcpu_vpmu(v);
>>> +    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>>> +        return 0;
>>> +
>>> +    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
>>> +    {
>>> +        /* PV guest or dom0 is doing system profiling */
>>> +        void *p;
>>> +        struct cpu_user_regs *gregs;
>>> +
>>> +        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
>>> +
>>> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
>>> +        vpmu_save_force(v);
>>> +
>>> +        /* Store appropriate registers in xenpmu_data
>>> +         *
>>> +         * Note: '!current->is_running' is possible when 'set_current(next)'
>>> +         * for the (HVM) guest has been called but 'reset_stack_and_jump()'
>>> +         * has not (i.e. the guest is not actually running yet).
>>> +         */
>>> +        if ( !is_hvm_domain(current->domain) ||
>>> +             ((vpmu_mode & XENPMU_MODE_PRIV) && !current->is_running) )
>>> +        {
>>> +            /*
>>> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
>>> +             * and therefore we treat it the same way as a non-priviledged
>>> +             * PV 32-bit domain.
>>> +             */
>>> +            if ( is_pv_32bit_domain(current->domain) )
>>> +            {
>>> +                struct compat_cpu_user_regs cmp;
>>> +
>>> +                gregs = guest_cpu_user_regs();
>>> +                XLAT_cpu_user_regs(&cmp, gregs);
>>> +                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
>>> +            }
>>> +            else if ( (current->domain != dom0) && !is_idle_vcpu(current) &&
>>> +                !(vpmu_mode & XENPMU_MODE_PRIV) )
>>> +            {
>>> +                /* PV guest */
>>> +                gregs = guest_cpu_user_regs();
>>> +                memcpy(p, gregs, sizeof(struct cpu_user_regs));
>>> +            }
>>> +            else
>>> +                memcpy(p, regs, sizeof(struct cpu_user_regs));
>>> +        }
>>> +        else
>>> +        {
>>> +            /* HVM guest */
>>> +            struct segment_register cs;
>>> +
>>> +            gregs = guest_cpu_user_regs();
>>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>>> +            gregs->cs = cs.attr.fields.dpl;
>>> +
>>> +            memcpy(p, gregs, sizeof(struct cpu_user_regs));
>>> +        }
>>> +
>>> +        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
>>> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
>>> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
>>> +
>>> +        raise_softirq(PMU_SOFTIRQ);
>>> +        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
>>> +
>>> +        return 1;
>>> +    }
>>> +    else  if ( vpmu->arch_vpmu_ops )
>>>       {
>>> -        struct vlapic *vlapic = vcpu_vlapic(v);
>>> +        /* HVM guest */
>>> +        struct vlapic *vlapic;
>>>           u32 vlapic_lvtpc;
>>>           unsigned char int_vec;
>>>   
>>>           if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>>>               return 0;
>>>   
>>> +        vlapic = vcpu_vlapic(v);
>>>           if ( !is_vlapic_lvtpc_enabled(vlapic) )
>>>               return 1;
>>>   
>> Assuming the plan is to run this in NMI context - this is _a lot_ of
>> stuff you want to do. Did you carefully audit all paths for being
>> NMI-safe?
>>
>> Jan
> vpmu_save() is not safe from an NMI context, as its non-NMI context uses
> local_irq_disable() to achieve consistency.

Sigh... hvm_get_segment_register() also appears to be unsafe. I'll will 
need to
move it somewhere else.

-boris