Re: [PATCH] xen: add hypercall option to temporarily pin a vcpu

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	dario.faggioli@citrix.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov
Subject: Re: [PATCH] xen: add hypercall option to temporarily pin a vcpu
Date: Fri, 26 Feb 2016 12:14:48 +0100	[thread overview]
Message-ID: <56D033A8.7020009@suse.com> (raw)
In-Reply-To: <56D0395702000078000D69A6@suse.com>

On 26/02/16 11:39, Jan Beulich wrote:
>>>> On 25.02.16 at 17:50, <JGross@suse.com> wrote:
>> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu)
>>              if ( cpumask_empty(&online_affinity) &&
>>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>>              {
>> -                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
>> +                if ( v->affinity_broken )
>> +                {
>> +                    /* The vcpu is temporarily pinned, can't move it. */
>> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> +                    ret = -EBUSY;
>> +                    continue;
>> +                }
> 
> So far the function can only return 0 or -EAGAIN. By using "continue"
> here you will make it impossible for the caller to reliably determine
> whether possibly both things failed. Despite -EBUSY being a logical
> choice here, I think you'd better use -EAGAIN here too. And it needs
> to be determined whether continuing the loop in this as well as the
> pre-existing cases is actually the right thing to do.

EBUSY vs. EAGAIN: by returning EAGAIN I would signal to Xen tools that
the hypervisor is currently not able to do the desired operation
(especially removing a cpu from a cpupool), but the situation will
change automatically via scheduling. EBUSY will stop retries in Xen
tools and this is want I want here: I can't be sure the situation
will change soon.

Regarding continuation of the loop: I think you are right in the
EBUSY case: I should break out of the loop. I should not do so in the
EAGAIN case as I want to remove as many vcpus from the physical cpu as
possible without returning to the Xen tools in between.

> 
>> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>>                      v->affinity_broken = 1;
>>                  }
>>  
>> +                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
> 
> Wouldn't it be even better to make this the "else" to the
> preceding if(), since in the suspend case this is otherwise going
> to be printed for every vCPU not currently running on pCPU0?

Yes, I'll change it.

> 
>> @@ -753,14 +767,22 @@ static int vcpu_set_affinity(
>>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>>  {
>>      spinlock_t *lock;
>> +    int ret = 0;
>>  
>>      lock = vcpu_schedule_lock_irq(v);
>>  
>> -    cpumask_copy(which, affinity);
>> +    if ( v->affinity_broken )
>> +    {
>> +        ret = -EBUSY;
>> +    }
> 
> Unnecessary braces.

Will remove.

> 
>> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d)
>>          kill_timer(&d->watchdog_timer[i]);
>>  }
>>  
>> +static long do_pin_temp(int cpu)
>> +{
>> +    struct vcpu *v = current;
>> +    spinlock_t *lock;
>> +    long ret = -EINVAL;
>> +
>> +    lock = vcpu_schedule_lock_irq(v);
>> +
>> +    if ( cpu == -1 )
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
>> +            v->affinity_broken = 0;
>> +            set_bit(_VPF_migrating, &v->pause_flags);
>> +            ret = 0;
>> +        }
>> +    }
>> +    else if ( cpu < nr_cpu_ids && cpu >= 0 )
> 
> Perhaps easier to simply use "cpu < 0" in the first if()?

Okay.

> 
>> +    {
>> +        if ( v->affinity_broken )
>> +        {
>> +            ret = -EBUSY;
>> +        }
>> +        else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
>> +        {
> 
> This is a rather ugly restriction: How would a caller fulfill its job
> when this is not the case?

He can't. We should document that at least on hardware requiring this
functionality it is a bad idea to remove cpu 0 from the cpupool with the
hardware domain.

> 
>> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case SCHEDOP_pin_temp:
>> +    {
>> +        struct sched_pin_temp sched_pin_temp;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
>> +            break;
>> +
>> +        ret = xsm_schedop_pin_temp(XSM_PRIV);
>> +        if ( ret )
>> +            break;
>> +
>> +        ret = do_pin_temp(sched_pin_temp.pcpu);
>> +
>> +        break;
>> +    }
> 
> So having come here I still don't see why this is called "temp":
> Nothing enforces this to be a temporary state, and hence the
> sub-op name currently is actively misleading.

I've chosen this name as the old affinity is saved and can (and should)
be recovered later. So it is intended to be temporary.

>> --- a/xen/include/public/sched.h
>> +++ b/xen/include/public/sched.h
>> @@ -118,6 +118,15 @@
>>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>>   */
>>  #define SCHEDOP_watchdog    6
>> +
>> +/*
>> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
>> + * @arg == pointer to sched_pin_temp_t structure.
>> + *
>> + * Setting pcpu to -1 will undo a previous temporary pinning.
>> + * This call is allowed for domains with domain control privilege only.
>> + */
> 
> Why domain control privilege? I'd actually suggest limiting the
> ability to the hardware domain, at once eliminating the need
> for the XSM check.

Sure, I'd be happy to simplify the patch.

> 
>> +struct sched_pin_temp {
>> +    int pcpu;
> 
> Fixed width types only please in the public interface. Also this needs
> an entry in xen/include/xlat.lst, and a consumer of the resulting
> check macro.

Aah, okay.

Thanks for the review,

Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-02-26 11:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-25 16:50 [PATCH] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
2016-02-26 10:39 ` Jan Beulich
     [not found] ` <56D0395702000078000D69A6@suse.com>
2016-02-26 11:14   ` Juergen Gross [this message]
2016-02-26 11:20     ` Dario Faggioli
2016-02-26 11:43       ` Juergen Gross
2016-02-26 12:39       ` Jan Beulich
2016-02-26 13:07         ` Dario Faggioli
2016-02-26 13:32           ` Jan Beulich
2016-02-26 13:39             ` Dario Faggioli
     [not found]       ` <56D0559C02000078000D6AFE@suse.com>
2016-02-26 12:49         ` Juergen Gross
2016-02-26 13:34           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D033A8.7020009@suse.com \
    --to=jgross@suse.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).