From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
dario.faggioli@citrix.com, ian.jackson@eu.citrix.com,
xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov
Subject: Re: [PATCH] xen: add hypercall option to temporarily pin a vcpu
Date: Fri, 26 Feb 2016 12:14:48 +0100 [thread overview]
Message-ID: <56D033A8.7020009@suse.com> (raw)
In-Reply-To: <56D0395702000078000D69A6@suse.com>
On 26/02/16 11:39, Jan Beulich wrote:
>>>> On 25.02.16 at 17:50, <JGross@suse.com> wrote:
>> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu)
>> if ( cpumask_empty(&online_affinity) &&
>> cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>> {
>> - printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
>> + if ( v->affinity_broken )
>> + {
>> + /* The vcpu is temporarily pinned, can't move it. */
>> + vcpu_schedule_unlock_irqrestore(lock, flags, v);
>> + ret = -EBUSY;
>> + continue;
>> + }
>
> So far the function can only return 0 or -EAGAIN. By using "continue"
> here you will make it impossible for the caller to reliably determine
> whether possibly both things failed. Despite -EBUSY being a logical
> choice here, I think you'd better use -EAGAIN here too. And it needs
> to be determined whether continuing the loop in this as well as the
> pre-existing cases is actually the right thing to do.
EBUSY vs. EAGAIN: by returning EAGAIN I would signal to Xen tools that
the hypervisor is currently not able to do the desired operation
(especially removing a cpu from a cpupool), but the situation will
change automatically via scheduling. EBUSY will stop retries in Xen
tools and this is want I want here: I can't be sure the situation
will change soon.
Regarding continuation of the loop: I think you are right in the
EBUSY case: I should break out of the loop. I should not do so in the
EAGAIN case as I want to remove as many vcpus from the physical cpu as
possible without returning to the Xen tools in between.
>
>> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>> v->affinity_broken = 1;
>> }
>>
>> + printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
>
> Wouldn't it be even better to make this the "else" to the
> preceding if(), since in the suspend case this is otherwise going
> to be printed for every vCPU not currently running on pCPU0?
Yes, I'll change it.
>
>> @@ -753,14 +767,22 @@ static int vcpu_set_affinity(
>> struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>> {
>> spinlock_t *lock;
>> + int ret = 0;
>>
>> lock = vcpu_schedule_lock_irq(v);
>>
>> - cpumask_copy(which, affinity);
>> + if ( v->affinity_broken )
>> + {
>> + ret = -EBUSY;
>> + }
>
> Unnecessary braces.
Will remove.
>
>> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d)
>> kill_timer(&d->watchdog_timer[i]);
>> }
>>
>> +static long do_pin_temp(int cpu)
>> +{
>> + struct vcpu *v = current;
>> + spinlock_t *lock;
>> + long ret = -EINVAL;
>> +
>> + lock = vcpu_schedule_lock_irq(v);
>> +
>> + if ( cpu == -1 )
>> + {
>> + if ( v->affinity_broken )
>> + {
>> + cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
>> + v->affinity_broken = 0;
>> + set_bit(_VPF_migrating, &v->pause_flags);
>> + ret = 0;
>> + }
>> + }
>> + else if ( cpu < nr_cpu_ids && cpu >= 0 )
>
> Perhaps easier to simply use "cpu < 0" in the first if()?
Okay.
>
>> + {
>> + if ( v->affinity_broken )
>> + {
>> + ret = -EBUSY;
>> + }
>> + else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
>> + {
>
> This is a rather ugly restriction: How would a caller fulfill its job
> when this is not the case?
He can't. We should document that at least on hardware requiring this
functionality it is a bad idea to remove cpu 0 from the cpupool with the
hardware domain.
>
>> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> break;
>> }
>>
>> + case SCHEDOP_pin_temp:
>> + {
>> + struct sched_pin_temp sched_pin_temp;
>> +
>> + ret = -EFAULT;
>> + if ( copy_from_guest(&sched_pin_temp, arg, 1) )
>> + break;
>> +
>> + ret = xsm_schedop_pin_temp(XSM_PRIV);
>> + if ( ret )
>> + break;
>> +
>> + ret = do_pin_temp(sched_pin_temp.pcpu);
>> +
>> + break;
>> + }
>
> So having come here I still don't see why this is called "temp":
> Nothing enforces this to be a temporary state, and hence the
> sub-op name currently is actively misleading.
I've chosen this name as the old affinity is saved and can (and should)
be recovered later. So it is intended to be temporary.
>> --- a/xen/include/public/sched.h
>> +++ b/xen/include/public/sched.h
>> @@ -118,6 +118,15 @@
>> * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>> */
>> #define SCHEDOP_watchdog 6
>> +
>> +/*
>> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
>> + * @arg == pointer to sched_pin_temp_t structure.
>> + *
>> + * Setting pcpu to -1 will undo a previous temporary pinning.
>> + * This call is allowed for domains with domain control privilege only.
>> + */
>
> Why domain control privilege? I'd actually suggest limiting the
> ability to the hardware domain, at once eliminating the need
> for the XSM check.
Sure, I'd be happy to simplify the patch.
>
>> +struct sched_pin_temp {
>> + int pcpu;
>
> Fixed width types only please in the public interface. Also this needs
> an entry in xen/include/xlat.lst, and a consumer of the resulting
> check macro.
Aah, okay.
Thanks for the review,
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-02-26 11:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-25 16:50 [PATCH] xen: add hypercall option to temporarily pin a vcpu Juergen Gross
2016-02-26 10:39 ` Jan Beulich
[not found] ` <56D0395702000078000D69A6@suse.com>
2016-02-26 11:14 ` Juergen Gross [this message]
2016-02-26 11:20 ` Dario Faggioli
2016-02-26 11:43 ` Juergen Gross
2016-02-26 12:39 ` Jan Beulich
2016-02-26 13:07 ` Dario Faggioli
2016-02-26 13:32 ` Jan Beulich
2016-02-26 13:39 ` Dario Faggioli
[not found] ` <56D0559C02000078000D6AFE@suse.com>
2016-02-26 12:49 ` Juergen Gross
2016-02-26 13:34 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D033A8.7020009@suse.com \
--to=jgross@suse.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=dgdegra@tycho.nsa.gov \
--cc=george.dunlap@eu.citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).