xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: kevin.tian@intel.com, keir@xen.org,
	suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com,
	tim@xen.org, xen-devel@lists.xen.org, jun.nakajima@intel.com
Subject: Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
Date: Tue, 12 Aug 2014 11:12:38 -0400	[thread overview]
Message-ID: <53EA2EE6.7060100@oracle.com> (raw)
In-Reply-To: <53EA0A7E020000780002B81E@mail.emea.novell.com>

On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>   
>>       if ( is_hvm_vcpu(prev) )
>>       {
>> -        if (prev != next)
>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>               vpmu_save(prev);
>>   
>>           if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>                              !is_hardware_domain(next->domain));
>>       }
>>   
>> -    if (is_hvm_vcpu(next) && (prev != next) )
>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>           /* Must be done with interrupts enabled */
>>           vpmu_load(next);
> Wouldn't such vPMU internals be better hidden in the functions
> themselves? I realize you can save the calls this way, but if the
> condition changes again later, we'll again have to adjust this
> core function rather than just the vPMU code. It's bad enough that
> the vpmu_mode variable is visible to non-vPMU code.

How about an inline function?

>
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -21,6 +21,8 @@
>>   #include <xen/config.h>
>>   #include <xen/sched.h>
>>   #include <xen/xenoprof.h>
>> +#include <xen/event.h>
>> +#include <xen/guest_access.h>
>>   #include <asm/regs.h>
>>   #include <asm/types.h>
>>   #include <asm/msr.h>
>> @@ -32,13 +34,21 @@
>>   #include <asm/hvm/svm/vmcb.h>
>>   #include <asm/apic.h>
>>   #include <public/pmu.h>
>> +#include <xen/tasklet.h>
>> +
>> +#include <compat/pmu.h>
>> +CHECK_pmu_params;
>> +CHECK_pmu_intel_ctxt;
>> +CHECK_pmu_amd_ctxt;
>> +CHECK_pmu_cntr_pair;
> Such being placed in a HVM-specific file suggests that the series is badly
> ordered: Anything relevant to PV should normally only be added here
> _after_ the file got moved out of the hvm/ subtree. Yet since I realize
> this would be a major re-work, I think you can leave this as is unless
> you expect potentially just part of the series to go in, with the splitting
> point being (immediately or later) after this patch (from my looking at
> it I would suppose that patches 3 and 4 could go in right away - they
> don't appear to depend on patches 1 and 2 - and patch 1 probably
> could go in too, but it doesn't make much sense to have it in without
> the rest of the series).

I would prefer the whole series to get in in one go (with patch 2 
possibly being the only exception). All other patches (including patch 
1) are not particularly useful on their own.

>> +        goto cont_wait;
>> +
>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>> +                   cpumask_of(smp_processor_id()));
>> +
>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>> +    if ( !sync_task )
>> +    {
>> +        printk("vpmu_force_context_switch: out of memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>> +
>> +    atomic_set(&vpmu_sched_counter, 0);
>> +
>> +    j = 0;
>> +    for_each_cpu ( i, &allbutself )
> This looks to be the only use for the (on stack) allbutself variable,
> but you could easily avoid this by using for_each_online_cpu() and
> skipping the local one. I'd also recommend that you count
> allbutself_num here rather than up front, since that will much more
> obviously prove that you wait for exactly as many CPUs as you
> scheduled. The array allocation above is bogus anyway, as on a
> huge system this can easily be more than a page in size.

I don't understand the last sentence. What does page-sized allocation 
have to do with this?

>
>> +        tasklet_schedule_on_cpu(&sync_task[j++], i);
>> +
>> +    vpmu_save(current);
>> +
>> +    start = NOW();
>> +
>> + cont_wait:
>> +    /*
>> +     * Note that we may fail here if a CPU is hot-unplugged while we are
>> +     * waiting. We will then time out.
>> +     */
>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>> +    {
>> +        /* Give up after 5 seconds */
>> +        if ( NOW() > start + SECONDS(5) )
>> +        {
>> +            printk("vpmu_force_context_switch: failed to sync\n");
>> +            ret = -EBUSY;
>> +            break;
>> +        }
>> +        cpu_relax();
>> +        if ( hypercall_preempt_check() )
>> +            return hypercall_create_continuation(
>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_kill(&sync_task[i]);
>> +    xfree(sync_task);
>> +    sync_task = NULL;
>> +
>> +    return ret;
>> +}
>> +
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
>> +
>> +    switch ( op )
>> +    {
>> +    case XENPMU_mode_set:
>> +    {
>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>> +        uint32_t current_mode;
>> +
>> +        if ( !is_control_domain(current->domain) )
>> +            return -EPERM;
>> +
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>> +            return -EINVAL;
>> +
>> +        /*
>> +         * Return error is someone else is in the middle of changing mode ---
>> +         * this is most likely indication of two system administrators
>> +         * working against each other
>> +         */
>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>> +            return -EAGAIN;
>> +
>> +        current_mode = vpmu_mode;
>> +        vpmu_mode = pmu_params.val;
>> +
>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>> +        {
>> +            /*
>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
>> +             * can be achieved by having all physical processors go through
>> +             * context_switch().
>> +             */
>> +            ret = vpmu_force_context_switch(arg);
>> +            if ( ret )
>> +                vpmu_mode = current_mode;
>> +        }
>> +        else
>> +            ret = 0;
>> +
>> +        spin_unlock(&xenpmu_mode_lock);
>> +        break;
> This still isn't safe: There's nothing preventing another vCPU to issue
> another XENPMU_mode_set operation while the one turning the vPMU
> off is still in the process of waiting, but having exited the lock protected
> region in order to allow other processing to occur.

I think that's OK: one of these two competing requests will timeout in 
the while loop of vpmu_force_context_switch(). It will, in fact, likely 
be the first caller because the second one will get right into the while 
loop, without setting up the waiting data. This is a little 
counter-intuitive but trying to set mode simultaneously from two 
different places is asking for trouble anyway.


> I think you simply
> need another mode "being-turned-off" during which only mode
> changes to XENPMU_MODE_OFF, and only by the originally requesting
> vCPU, are permitted (or else your "if true, we are in hypercall
> continuation" comment above wouldn't always be true either, as that
> second vCPU might also issue a second turn-off request).

-boris

  reply	other threads:[~2014-08-12 15:12 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
2014-08-11 13:28   ` Jan Beulich
2014-08-11 15:35     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-08-11 13:45   ` Jan Beulich
2014-08-11 16:01     ` Boris Ostrovsky
2014-08-11 16:13       ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 06/20] vmx: Merge MSR management routines Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-08-11 13:49   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-08-11 14:08   ` Jan Beulich
2014-08-11 16:15     ` Boris Ostrovsky
2014-08-18 16:02       ` Boris Ostrovsky
2014-08-18 20:23         ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-08-12 10:37   ` Jan Beulich
2014-08-12 15:12     ` Boris Ostrovsky [this message]
2014-08-12 15:35       ` Jan Beulich
2014-08-12 16:25         ` Boris Ostrovsky
2014-08-14 16:32           ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
2014-08-12 11:59   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
2014-08-12 12:45   ` Jan Beulich
2014-08-12 15:47     ` Boris Ostrovsky
2014-08-12 16:00       ` Jan Beulich
2014-08-12 16:30         ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
2014-08-12 12:55   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-08-12 12:58   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-08-12 13:06   ` Jan Beulich
2014-08-12 16:14     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-08-12 14:15   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-08-12 14:24   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2014-08-12 14:26   ` Jan Beulich
2014-08-11 13:32 ` [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EA2EE6.7060100@oracle.com \
    --to=boris.ostrovsky@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).