From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests Date: Tue, 30 Sep 2014 12:37:43 -0400 Message-ID: <542ADC57.1070708@oracle.com> References: <1411673336-32736-1-git-send-email-boris.ostrovsky@oracle.com> <1411673336-32736-17-git-send-email-boris.ostrovsky@oracle.com> <542A81B6020000780003ADD4@mail.emea.novell.com> <542AC71C.6020804@oracle.com> <542AEC00020000780003B224@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <542AEC00020000780003B224@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: kevin.tian@intel.com, keir@xen.org, suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com, tim@xen.org, dietmar.hahn@ts.fujitsu.com, xen-devel@lists.xen.org, Aravind.Gopalakrishnan@amd.com, jun.nakajima@intel.com, dgdegra@tycho.nsa.gov List-Id: xen-devel@lists.xenproject.org On 09/30/2014 11:44 AM, Jan Beulich wrote: > >>>> +static struct vcpu *choose_hwdom_vcpu(void) >>>> +{ >>>> + struct vcpu *v; >>>> + unsigned idx = smp_processor_id() % hardware_domain->max_vcpus; >>>> + >>>> + if ( hardware_domain->vcpu == NULL ) >>>> + return NULL; >>>> + >>>> + v = hardware_domain->vcpu[idx]; >>>> + >>>> + /* >>>> + * If index is not populated search downwards the vcpu array until >>>> + * a valid vcpu can be found >>>> + */ >>>> + while ( !v && idx-- ) >>>> + v = hardware_domain->vcpu[idx]; >>> Each time I get here I wonder what case this is good for. >> I thought we can have a case when first hardware_domain->vcpu[idx] is >> NULL so we walk the array down until we find the first non-NULL vcpu. >> Hot unplug, for example (we may be calling this from NMI context). > Hot unplug of a vCPU is a guest thing - this doesn't destroy the > vCPU in the hypervisor. OK, I don't need this loop then. > >>>> int vpmu_do_interrupt(struct cpu_user_regs *regs) >>>> { >>>> - struct vcpu *v = current; >>>> - struct vpmu_struct *vpmu = vcpu_vpmu(v); >>>> + struct vcpu *sampled = current, *sampling; >>>> + struct vpmu_struct *vpmu; >>>> + >>>> + /* dom0 will handle interrupt for special domains (e.g. idle domain) */ >>>> + if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED ) >>>> + { >>>> + sampling = choose_hwdom_vcpu(); >>>> + if ( !sampling ) >>>> + return 0; >>>> + } >>>> + else >>>> + sampling = sampled; >>>> + >>>> + vpmu = vcpu_vpmu(sampling); >>>> + if ( !is_hvm_domain(sampling->domain) ) >>>> + { >>>> + /* PV(H) guest */ >>>> + const struct cpu_user_regs *cur_regs; >>>> + >>>> + if ( !vpmu->xenpmu_data ) >>>> + return 0; >>>> + >>>> + if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED ) >>>> + return 1; >>>> + >>>> + if ( is_pvh_domain(sampled->domain) && >>> Here and below - is this really the right condition? I.e. is the >>> opposite case (doing nothing here, but the one further down >>> having an else) really meant to cover both HVM and PV? The outer >>> !is_hvm_() doesn't count here as that acts on sampling, not >>> sampled. >> This is test for an error in do_interrupt() --- if it reported a failure >> then there is no reason to proceed further. > That's not the question. Why is this done only for PVH? This should be sampling, i.e. the guest who is managing the HW PMU MSR. Not sampled. > >>>> + { >>>> + r->cs = cur_regs->cs; >>>> + if ( sampled->arch.flags & TF_kernel_mode ) >>>> + r->cs &= ~3; >>> And once again I wonder how the consumer of this data is to tell >>> apart guest kernel and hypervisor addresses. >> Based on the RIP --- perf, for example, searches through various symbol >> tables. > That doesn't help when profiling HVM/PVH guests - addresses are > ambiguous in that case. Hypervisor traces are only sent to dom0, which is currently PV only. The key here, of course, is the word 'currently'. > >> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF >> for guest and DOMID_XEN for the hypervisor. > That's an option, but I'm really having reservations against simulating > ring-0 execution in PV guests here. It would certainly be better if we > could report reality here, but I can see reservations on the consumer > (perf) side against us doing so. Yes, perf will probably not like it --- as I mentioned in an earlier message, it calls user_mode(regs) which is essentially !!(regs->cs & 3). -boris