From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Date: Tue, 12 Aug 2014 11:12:38 -0400 Message-ID: <53EA2EE6.7060100@oracle.com> References: <1407516946-17833-1-git-send-email-boris.ostrovsky@oracle.com> <1407516946-17833-12-git-send-email-boris.ostrovsky@oracle.com> <53EA0A7E020000780002B81E@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53EA0A7E020000780002B81E@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: kevin.tian@intel.com, keir@xen.org, suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, jun.nakajima@intel.com List-Id: xen-devel@lists.xenproject.org On 08/12/2014 06:37 AM, Jan Beulich wrote: >>>> On 08.08.14 at 18:55, wrote: >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) >> >> if ( is_hvm_vcpu(prev) ) >> { >> - if (prev != next) >> + if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) ) >> vpmu_save(prev); >> >> if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) ) >> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) >> !is_hardware_domain(next->domain)); >> } >> >> - if (is_hvm_vcpu(next) && (prev != next) ) >> + if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) ) >> /* Must be done with interrupts enabled */ >> vpmu_load(next); > Wouldn't such vPMU internals be better hidden in the functions > themselves? I realize you can save the calls this way, but if the > condition changes again later, we'll again have to adjust this > core function rather than just the vPMU code. It's bad enough that > the vpmu_mode variable is visible to non-vPMU code. How about an inline function? > >> --- a/xen/arch/x86/hvm/vpmu.c >> +++ b/xen/arch/x86/hvm/vpmu.c >> @@ -21,6 +21,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> #include >> #include >> #include >> @@ -32,13 +34,21 @@ >> #include >> #include >> #include >> +#include >> + >> +#include >> +CHECK_pmu_params; >> +CHECK_pmu_intel_ctxt; >> +CHECK_pmu_amd_ctxt; >> +CHECK_pmu_cntr_pair; > Such being placed in a HVM-specific file suggests that the series is badly > ordered: Anything relevant to PV should normally only be added here > _after_ the file got moved out of the hvm/ subtree. Yet since I realize > this would be a major re-work, I think you can leave this as is unless > you expect potentially just part of the series to go in, with the splitting > point being (immediately or later) after this patch (from my looking at > it I would suppose that patches 3 and 4 could go in right away - they > don't appear to depend on patches 1 and 2 - and patch 1 probably > could go in too, but it doesn't make much sense to have it in without > the rest of the series). I would prefer the whole series to get in in one go (with patch 2 possibly being the only exception). All other patches (including patch 1) are not particularly useful on their own. >> + goto cont_wait; >> + >> + cpumask_andnot(&allbutself, &cpu_online_map, >> + cpumask_of(smp_processor_id())); >> + >> + sync_task = xmalloc_array(struct tasklet, allbutself_num); >> + if ( !sync_task ) >> + { >> + printk("vpmu_force_context_switch: out of memory\n"); >> + return -ENOMEM; >> + } >> + >> + for ( i = 0; i < allbutself_num; i++ ) >> + tasklet_init(&sync_task[i], vpmu_sched_checkin, 0); >> + >> + atomic_set(&vpmu_sched_counter, 0); >> + >> + j = 0; >> + for_each_cpu ( i, &allbutself ) > This looks to be the only use for the (on stack) allbutself variable, > but you could easily avoid this by using for_each_online_cpu() and > skipping the local one. I'd also recommend that you count > allbutself_num here rather than up front, since that will much more > obviously prove that you wait for exactly as many CPUs as you > scheduled. The array allocation above is bogus anyway, as on a > huge system this can easily be more than a page in size. I don't understand the last sentence. What does page-sized allocation have to do with this? > >> + tasklet_schedule_on_cpu(&sync_task[j++], i); >> + >> + vpmu_save(current); >> + >> + start = NOW(); >> + >> + cont_wait: >> + /* >> + * Note that we may fail here if a CPU is hot-unplugged while we are >> + * waiting. We will then time out. >> + */ >> + while ( atomic_read(&vpmu_sched_counter) != allbutself_num ) >> + { >> + /* Give up after 5 seconds */ >> + if ( NOW() > start + SECONDS(5) ) >> + { >> + printk("vpmu_force_context_switch: failed to sync\n"); >> + ret = -EBUSY; >> + break; >> + } >> + cpu_relax(); >> + if ( hypercall_preempt_check() ) >> + return hypercall_create_continuation( >> + __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg); >> + } >> + >> + for ( i = 0; i < allbutself_num; i++ ) >> + tasklet_kill(&sync_task[i]); >> + xfree(sync_task); >> + sync_task = NULL; >> + >> + return ret; >> +} >> + >> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg) >> +{ >> + int ret = -EINVAL; >> + xen_pmu_params_t pmu_params; >> + >> + switch ( op ) >> + { >> + case XENPMU_mode_set: >> + { >> + static DEFINE_SPINLOCK(xenpmu_mode_lock); >> + uint32_t current_mode; >> + >> + if ( !is_control_domain(current->domain) ) >> + return -EPERM; >> + >> + if ( copy_from_guest(&pmu_params, arg, 1) ) >> + return -EFAULT; >> + >> + if ( pmu_params.val & ~XENPMU_MODE_SELF ) >> + return -EINVAL; >> + >> + /* >> + * Return error is someone else is in the middle of changing mode --- >> + * this is most likely indication of two system administrators >> + * working against each other >> + */ >> + if ( !spin_trylock(&xenpmu_mode_lock) ) >> + return -EAGAIN; >> + >> + current_mode = vpmu_mode; >> + vpmu_mode = pmu_params.val; >> + >> + if ( vpmu_mode == XENPMU_MODE_OFF ) >> + { >> + /* >> + * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This >> + * can be achieved by having all physical processors go through >> + * context_switch(). >> + */ >> + ret = vpmu_force_context_switch(arg); >> + if ( ret ) >> + vpmu_mode = current_mode; >> + } >> + else >> + ret = 0; >> + >> + spin_unlock(&xenpmu_mode_lock); >> + break; > This still isn't safe: There's nothing preventing another vCPU to issue > another XENPMU_mode_set operation while the one turning the vPMU > off is still in the process of waiting, but having exited the lock protected > region in order to allow other processing to occur. I think that's OK: one of these two competing requests will timeout in the while loop of vpmu_force_context_switch(). It will, in fact, likely be the first caller because the second one will get right into the while loop, without setting up the waiting data. This is a little counter-intuitive but trying to set mode simultaneously from two different places is asking for trouble anyway. > I think you simply > need another mode "being-turned-off" during which only mode > changes to XENPMU_MODE_OFF, and only by the originally requesting > vCPU, are permitted (or else your "if true, we are in hypercall > continuation" comment above wouldn't always be true either, as that > second vCPU might also issue a second turn-off request). -boris