Re: [PATCH] xen: add hypercall option to temporarily pin a vcpu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	dario.faggioli@citrix.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov
Subject: Re: [PATCH] xen: add hypercall option to temporarily pin a vcpu
Date: Fri, 26 Feb 2016 12:14:48 +0100	[thread overview]
Message-ID: <56D033A8.7020009@suse.com> (raw)
In-Reply-To: <56D0395702000078000D69A6@suse.com>

On 26/02/16 11:39, Jan Beulich wrote:
>>>> On 25.02.16 at 17:50, <JGross@suse.com> wrote:
>> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> -                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    continue;
>> +                }
> 
> So far the function can only return 0 or -EAGAIN. By using "continue"
> here you will make it impossible for the caller to reliably determine
> whether possibly both things failed. Despite -EBUSY being a logical
> choice here, I think you'd better use -EAGAIN here too. And it needs
> to be determined whether continuing the loop in this as well as the
> pre-existing cases is actually the right thing to do.

EBUSY vs. EAGAIN: by returning EAGAIN I would signal to Xen tools that
the hypervisor is currently not able to do the desired operation
(especially removing a cpu from a cpupool), but the situation will
change automatically via scheduling. EBUSY will stop retries in Xen
tools and this is want I want here: I can't be sure the situation
will change soon.

Regarding continuation of the loop: I think you are right in the
EBUSY case: I should break out of the loop. I should not do so in the
EAGAIN case as I want to remove as many vcpus from the physical cpu as
possible without returning to the Xen tools in between.

> 
>> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>>                      v->affinity_broken = 1;
>>                  }
>>  
>> +                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
> 
> Wouldn't it be even better to make this the "else" to the
> preceding if(), since in the suspend case this is otherwise going
> to be printed for every vCPU not currently running on pCPU0?

Yes, I'll change it.

> 
>> @@ -753,14 +767,22 @@ static int vcpu_set_affinity(
>>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>>  {
>>      spinlock_t *lock;
>> +    int ret = 0;
>>  
>>      lock = vcpu_schedule_lock_irq(v);
>>  
>> -    cpumask_copy(which, affinity);
>> +    if ( v->affinity_broken )
>> +    {
>> +        ret = -EBUSY;
>> +    }
> 
> Unnecessary braces.

Will remove.

> 
>> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d)
>>          kill_timer(&d->watchdog_timer[i]);
>>  }
>>  
>> +static long do_pin_temp(int cpu)
>> +{
>> +    struct vcpu *v = current;
>> +    spinlock_t *lock;
>> +    long ret = -EINVAL;
>> +
>> +    lock = vcpu_schedule_lock_irq(v);
>> +
>> +    if ( cpu == -1 )
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
>> +            v->affinity_broken = 0;
>> +            set_bit(_VPF_migrating, &v->pause_flags);
>> +            ret = 0;
>> +        }
>> +    }
>> +    else if ( cpu < nr_cpu_ids && cpu >= 0 )
> 
> Perhaps easier to simply use "cpu < 0" in the first if()?

Okay.

> 
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            ret = -EBUSY;
>> +        }
>> +        else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
>> +        {
> 
> This is a rather ugly restriction: How would a caller fulfill its job
> when this is not the case?

He can't. We should document that at least on hardware requiring this
functionality it is a bad idea to remove cpu 0 from the cpupool with the
hardware domain.

> 
>> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case SCHEDOP_pin_temp:
>> +    {
>> +        struct sched_pin_temp sched_pin_temp;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
>> +            break;
>> +
>> +        ret = xsm_schedop_pin_temp(XSM_PRIV);
>> +        if ( ret )
>> +            break;
>> +
>> +        ret = do_pin_temp(sched_pin_temp.pcpu);
>> +
>> +        break;
>> +    }
> 
> So having come here I still don't see why this is called "temp":
> Nothing enforces this to be a temporary state, and hence the
> sub-op name currently is actively misleading.

I've chosen this name as the old affinity is saved and can (and should)
be recovered later. So it is intended to be temporary.

>> --- a/xen/include/public/sched.h
>> +++ b/xen/include/public/sched.h
>> @@ -118,6 +118,15 @@
>>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>>   */
>>  #define SCHEDOP_watchdog    6
>> +
>> +/*
>> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
>> + * @arg == pointer to sched_pin_temp_t structure.
>> + *
>> + * Setting pcpu to -1 will undo a previous temporary pinning.
>> + * This call is allowed for domains with domain control privilege only.
>> + */
> 
> Why domain control privilege? I'd actually suggest limiting the
> ability to the hardware domain, at once eliminating the need
> for the XSM check.

Sure, I'd be happy to simplify the patch.

> 
>> +struct sched_pin_temp {
>> +    int pcpu;
> 
> Fixed width types only please in the public interface. Also this needs
> an entry in xen/include/xlat.lst, and a consumer of the resulting
> check macro.

Aah, okay.

Thanks for the review,

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-02-26 11:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-25 16:50 [PATCH] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
2016-02-26 10:39 ` Jan Beulich
     [not found] ` <56D0395702000078000D69A6@suse.com>
2016-02-26 11:14   ` Juergen Gross [this message]
2016-02-26 11:20     ` Dario Faggioli
2016-02-26 11:43       ` Juergen Gross
2016-02-26 12:39       ` Jan Beulich
2016-02-26 13:07         ` Dario Faggioli
2016-02-26 13:32           ` Jan Beulich
2016-02-26 13:39             ` Dario Faggioli
     [not found]       ` <56D0559C02000078000D6AFE@suse.com>
2016-02-26 12:49         ` Juergen Gross
2016-02-26 13:34           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D033A8.7020009@suse.com \
    --to=jgross@suse.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.