All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: kevin.tian@intel.com, keir@xen.org,
	suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com,
	tim@xen.org, xen-devel@lists.xen.org, jun.nakajima@intel.com
Subject: Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
Date: Tue, 12 Aug 2014 11:12:38 -0400	[thread overview]
Message-ID: <53EA2EE6.7060100@oracle.com> (raw)
In-Reply-To: <53EA0A7E020000780002B81E@mail.emea.novell.com>

On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>   
>>       if ( is_hvm_vcpu(prev) )
>>       {
>> -        if (prev != next)
>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>               vpmu_save(prev);
>>   
>>           if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>                              !is_hardware_domain(next->domain));
>>       }
>>   
>> -    if (is_hvm_vcpu(next) && (prev != next) )
>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>           /* Must be done with interrupts enabled */
>>           vpmu_load(next);
> Wouldn't such vPMU internals be better hidden in the functions
> themselves? I realize you can save the calls this way, but if the
> condition changes again later, we'll again have to adjust this
> core function rather than just the vPMU code. It's bad enough that
> the vpmu_mode variable is visible to non-vPMU code.

How about an inline function?

>
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -21,6 +21,8 @@
>>   #include <xen/config.h>
>>   #include <xen/sched.h>
>>   #include <xen/xenoprof.h>
>> +#include <xen/event.h>
>> +#include <xen/guest_access.h>
>>   #include <asm/regs.h>
>>   #include <asm/types.h>
>>   #include <asm/msr.h>
>> @@ -32,13 +34,21 @@
>>   #include <asm/hvm/svm/vmcb.h>
>>   #include <asm/apic.h>
>>   #include <public/pmu.h>
>> +#include <xen/tasklet.h>
>> +
>> +#include <compat/pmu.h>
>> +CHECK_pmu_params;
>> +CHECK_pmu_intel_ctxt;
>> +CHECK_pmu_amd_ctxt;
>> +CHECK_pmu_cntr_pair;
> Such being placed in a HVM-specific file suggests that the series is badly
> ordered: Anything relevant to PV should normally only be added here
> _after_ the file got moved out of the hvm/ subtree. Yet since I realize
> this would be a major re-work, I think you can leave this as is unless
> you expect potentially just part of the series to go in, with the splitting
> point being (immediately or later) after this patch (from my looking at
> it I would suppose that patches 3 and 4 could go in right away - they
> don't appear to depend on patches 1 and 2 - and patch 1 probably
> could go in too, but it doesn't make much sense to have it in without
> the rest of the series).

I would prefer the whole series to get in in one go (with patch 2 
possibly being the only exception). All other patches (including patch 
1) are not particularly useful on their own.

>> +        goto cont_wait;
>> +
>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>> +                   cpumask_of(smp_processor_id()));
>> +
>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>> +    if ( !sync_task )
>> +    {
>> +        printk("vpmu_force_context_switch: out of memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>> +
>> +    atomic_set(&vpmu_sched_counter, 0);
>> +
>> +    j = 0;
>> +    for_each_cpu ( i, &allbutself )
> This looks to be the only use for the (on stack) allbutself variable,
> but you could easily avoid this by using for_each_online_cpu() and
> skipping the local one. I'd also recommend that you count
> allbutself_num here rather than up front, since that will much more
> obviously prove that you wait for exactly as many CPUs as you
> scheduled. The array allocation above is bogus anyway, as on a
> huge system this can easily be more than a page in size.

I don't understand the last sentence. What does page-sized allocation 
have to do with this?

>
>> +        tasklet_schedule_on_cpu(&sync_task[j++], i);
>> +
>> +    vpmu_save(current);
>> +
>> +    start = NOW();
>> +
>> + cont_wait:
>> +    /*
>> +     * Note that we may fail here if a CPU is hot-unplugged while we are
>> +     * waiting. We will then time out.
>> +     */
>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>> +    {
>> +        /* Give up after 5 seconds */
>> +        if ( NOW() > start + SECONDS(5) )
>> +        {
>> +            printk("vpmu_force_context_switch: failed to sync\n");
>> +            ret = -EBUSY;
>> +            break;
>> +        }
>> +        cpu_relax();
>> +        if ( hypercall_preempt_check() )
>> +            return hypercall_create_continuation(
>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_kill(&sync_task[i]);
>> +    xfree(sync_task);
>> +    sync_task = NULL;
>> +
>> +    return ret;
>> +}
>> +
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
>> +
>> +    switch ( op )
>> +    {
>> +    case XENPMU_mode_set:
>> +    {
>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>> +        uint32_t current_mode;
>> +
>> +        if ( !is_control_domain(current->domain) )
>> +            return -EPERM;
>> +
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>> +            return -EINVAL;
>> +
>> +        /*
>> +         * Return error is someone else is in the middle of changing mode ---
>> +         * this is most likely indication of two system administrators
>> +         * working against each other
>> +         */
>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>> +            return -EAGAIN;
>> +
>> +        current_mode = vpmu_mode;
>> +        vpmu_mode = pmu_params.val;
>> +
>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>> +        {
>> +            /*
>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
>> +             * can be achieved by having all physical processors go through
>> +             * context_switch().
>> +             */
>> +            ret = vpmu_force_context_switch(arg);
>> +            if ( ret )
>> +                vpmu_mode = current_mode;
>> +        }
>> +        else
>> +            ret = 0;
>> +
>> +        spin_unlock(&xenpmu_mode_lock);
>> +        break;
> This still isn't safe: There's nothing preventing another vCPU to issue
> another XENPMU_mode_set operation while the one turning the vPMU
> off is still in the process of waiting, but having exited the lock protected
> region in order to allow other processing to occur.

I think that's OK: one of these two competing requests will timeout in 
the while loop of vpmu_force_context_switch(). It will, in fact, likely 
be the first caller because the second one will get right into the while 
loop, without setting up the waiting data. This is a little 
counter-intuitive but trying to set mode simultaneously from two 
different places is asking for trouble anyway.


> I think you simply
> need another mode "being-turned-off" during which only mode
> changes to XENPMU_MODE_OFF, and only by the originally requesting
> vCPU, are permitted (or else your "if true, we are in hypercall
> continuation" comment above wouldn't always be true either, as that
> second vCPU might also issue a second turn-off request).

-boris

  reply	other threads:[~2014-08-12 15:12 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
2014-08-11 13:28   ` Jan Beulich
2014-08-11 15:35     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-08-11 13:45   ` Jan Beulich
2014-08-11 16:01     ` Boris Ostrovsky
2014-08-11 16:13       ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 06/20] vmx: Merge MSR management routines Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-08-11 13:49   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-08-11 14:08   ` Jan Beulich
2014-08-11 16:15     ` Boris Ostrovsky
2014-08-18 16:02       ` Boris Ostrovsky
2014-08-18 20:23         ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-08-12 10:37   ` Jan Beulich
2014-08-12 15:12     ` Boris Ostrovsky [this message]
2014-08-12 15:35       ` Jan Beulich
2014-08-12 16:25         ` Boris Ostrovsky
2014-08-14 16:32           ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
2014-08-12 11:59   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
2014-08-12 12:45   ` Jan Beulich
2014-08-12 15:47     ` Boris Ostrovsky
2014-08-12 16:00       ` Jan Beulich
2014-08-12 16:30         ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
2014-08-12 12:55   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-08-12 12:58   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-08-12 13:06   ` Jan Beulich
2014-08-12 16:14     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-08-12 14:15   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-08-12 14:24   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2014-08-12 14:26   ` Jan Beulich
2014-08-11 13:32 ` [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EA2EE6.7060100@oracle.com \
    --to=boris.ostrovsky@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.