From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Date: Tue, 12 Aug 2014 12:25:24 -0400 Message-ID: <53EA3FF4.3060208@oracle.com> References: <1407516946-17833-1-git-send-email-boris.ostrovsky@oracle.com> <1407516946-17833-12-git-send-email-boris.ostrovsky@oracle.com> <53EA0A7E020000780002B81E@mail.emea.novell.com> <53EA2EE6.7060100@oracle.com> <53EA5051020000780002BBC0@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53EA5051020000780002BBC0@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: kevin.tian@intel.com, keir@xen.org, suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, jun.nakajima@intel.com List-Id: xen-devel@lists.xenproject.org On 08/12/2014 11:35 AM, Jan Beulich wrote: >>>> On 12.08.14 at 17:12, wrote: >> On 08/12/2014 06:37 AM, Jan Beulich wrote: >>>>>> On 08.08.14 at 18:55, wrote: >>>> --- a/xen/arch/x86/domain.c >>>> +++ b/xen/arch/x86/domain.c >>>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) >>>> >>>> if ( is_hvm_vcpu(prev) ) >>>> { >>>> - if (prev != next) >>>> + if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) ) >>>> vpmu_save(prev); >>>> >>>> if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) ) >>>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) >>>> !is_hardware_domain(next->domain)); >>>> } >>>> >>>> - if (is_hvm_vcpu(next) && (prev != next) ) >>>> + if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) ) >>>> /* Must be done with interrupts enabled */ >>>> vpmu_load(next); >>> Wouldn't such vPMU internals be better hidden in the functions >>> themselves? I realize you can save the calls this way, but if the >>> condition changes again later, we'll again have to adjust this >>> core function rather than just the vPMU code. It's bad enough that >>> the vpmu_mode variable is visible to non-vPMU code. >> How about an inline function? > Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU > logic to be handle in the called vPMU functions. I was thinking about an inline in vpmu.h. Something like inline void vpmu_next(struct vcpu *prev, struct vcpu *next) { if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) ) /* Must be done with interrupts enabled */ vpmu_load(next); } (and similar for vpmu_save()) >> I would prefer the whole series to get in in one go (with patch 2 >> possibly being the only exception). All other patches (including patch >> 1) are not particularly useful on their own. > Okay. In that case in patch 1, please consider swapping struct > xenpf_symdata's address and name fields (I had put a respective > note on the patch for when committing it), shrinking the structure > for compat mode guests. > >>>> + goto cont_wait; >>>> + >>>> + cpumask_andnot(&allbutself, &cpu_online_map, >>>> + cpumask_of(smp_processor_id())); >>>> + >>>> + sync_task = xmalloc_array(struct tasklet, allbutself_num); >>>> + if ( !sync_task ) >>>> + { >>>> + printk("vpmu_force_context_switch: out of memory\n"); >>>> + return -ENOMEM; >>>> + } >>>> + >>>> + for ( i = 0; i < allbutself_num; i++ ) >>>> + tasklet_init(&sync_task[i], vpmu_sched_checkin, 0); >>>> + >>>> + atomic_set(&vpmu_sched_counter, 0); >>>> + >>>> + j = 0; >>>> + for_each_cpu ( i, &allbutself ) >>> This looks to be the only use for the (on stack) allbutself variable, >>> but you could easily avoid this by using for_each_online_cpu() and >>> skipping the local one. I'd also recommend that you count >>> allbutself_num here rather than up front, since that will much more >>> obviously prove that you wait for exactly as many CPUs as you >>> scheduled. The array allocation above is bogus anyway, as on a >>> huge system this can easily be more than a page in size. >> I don't understand the last sentence. What does page-sized allocation >> have to do with this? > We dislike greater than page size allocations at runtime. I can allocate array of pointers and then allocate tasklets in the for(i) loop above if that makes it more palatable but I still need allbutself_num value earlier than what you suggested. > >>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg) >>>> +{ >>>> + int ret = -EINVAL; >>>> + xen_pmu_params_t pmu_params; >>>> + >>>> + switch ( op ) >>>> + { >>>> + case XENPMU_mode_set: >>>> + { >>>> + static DEFINE_SPINLOCK(xenpmu_mode_lock); >>>> + uint32_t current_mode; >>>> + >>>> + if ( !is_control_domain(current->domain) ) >>>> + return -EPERM; >>>> + >>>> + if ( copy_from_guest(&pmu_params, arg, 1) ) >>>> + return -EFAULT; >>>> + >>>> + if ( pmu_params.val & ~XENPMU_MODE_SELF ) >>>> + return -EINVAL; >>>> + >>>> + /* >>>> + * Return error is someone else is in the middle of changing mode --- >>>> + * this is most likely indication of two system administrators >>>> + * working against each other >>>> + */ >>>> + if ( !spin_trylock(&xenpmu_mode_lock) ) >>>> + return -EAGAIN; >>>> + >>>> + current_mode = vpmu_mode; >>>> + vpmu_mode = pmu_params.val; >>>> + >>>> + if ( vpmu_mode == XENPMU_MODE_OFF ) >>>> + { >>>> + /* >>>> + * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This >>>> + * can be achieved by having all physical processors go through >>>> + * context_switch(). >>>> + */ >>>> + ret = vpmu_force_context_switch(arg); >>>> + if ( ret ) >>>> + vpmu_mode = current_mode; >>>> + } >>>> + else >>>> + ret = 0; >>>> + >>>> + spin_unlock(&xenpmu_mode_lock); >>>> + break; >>> This still isn't safe: There's nothing preventing another vCPU to issue >>> another XENPMU_mode_set operation while the one turning the vPMU >>> off is still in the process of waiting, but having exited the lock protected >>> region in order to allow other processing to occur. >> I think that's OK: one of these two competing requests will timeout in >> the while loop of vpmu_force_context_switch(). It will, in fact, likely >> be the first caller because the second one will get right into the while >> loop, without setting up the waiting data. This is a little >> counter-intuitive but trying to set mode simultaneously from two >> different places is asking for trouble anyway. > Sure it is, but we nevertheless want the hypervisor to be safe. So I > consider what you currently have acceptable only if you can prove > that nothing bad can happen no matter in what order multiple > requests get issued - from looking over your code I wasn't able to > convince myself of this. Would a comment with what I said above (possibly a bit massaged) be sufficient? -boris