From: "Justin T. Weaver" <jtweaver@hawaii.edu>
To: xen-devel@lists.xen.org
Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
"Justin T. Weaver" <jtweaver@hawaii.edu>,
henric@hawaii.edu
Subject: [PATCH v4 2/5] sched: credit2: respect per-vcpu hard affinity
Date: Sun, 12 Jul 2015 22:13:39 -1000 [thread overview]
Message-ID: <1436775223-6397-3-git-send-email-jtweaver@hawaii.edu> (raw)
In-Reply-To: <1436775223-6397-1-git-send-email-jtweaver@hawaii.edu>
by making sure that vcpus only run on the pcpu(s) they are allowed to
run on based on their hard affinity cpu masks.
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Changes in v4:
* Renamed scratch_mask to _scratch_mask
* Renamed csched2_cpumask to scratch_mask
* Removed "else continue" in function choose_cpu's for_each_cpu loop to make
the code less confusing
* Added an ASSERT that triggers if _scratch_mask[cpu] is NULL after
allocation in function csched2_alloc_pdata
* Added assignment to NULL for _scratch_mask[cpu] after call to
free_cpumask_var in function csched2_alloc_pdata
* Changed allocation of _scratch_mask from using xmalloc_array back to using
xzalloc_array
* Moved allocation of _scratch_mask from function csched2_init to function
csched2_global_init
* Added comment to function csched2_vcpu_migrate explaining the need for the
vc->processor assignment after the else
* Modified comment before function get_fallback_cpu; reworded into bulleted
list
* Changed cpumask_any to cpumask_first at the end of function get_fallback_cpu
* Fixed indentation in function get_fallback_cpu to align with opening parens
* Changed function get_fallback_cpu to variant suggested in the v3 review
* Changed comment before function vcpu_is_migrateable; vcpu svc to just svc
* Changed "run queue" in several comments to "runqueue"
* Renamed function valid_vcpu_migration to vcpu_is_migrateable
* Made condition check in function vcpu_is_migrateable "positive"
Changes in v3:
(all changes are based on v2 review comments unless noted)
* Renamed cpumask to scratch_mask
* Renamed function get_safe_pcpu to get_fallback_cpu
* Improved comment for function get_fallback_cpu
* Replaced cpupool_online_cpumask with VCPU2ONLINE in function
get_fallback_cpu to shorten the line
* Added #define for VCPU2ONLINE (probably should be factored out of
schedule.c and here, and put into a common header)
* Modified code in function get_fallback_cpu: moved check for current
processor to the top; added an ASSERT because the mask should not be empty
* Modified code and comment in function choose_cpu in migrate request section
* Added comment to function choose_cpu explaining why the vcpu passed to the
function might not have hard affinity with any of the pcpus in its assigned
run queue
* Modified code in function choose_cpu to make it more readable
* Moved/changed "We didn't find ..." comment in function choose_cpu
* Combined migration flag check and hard affinity check into valid migration
check helper function; replaced code in three places in function
balance_load with call to the helper function
* Changed a BUG_ON to an ASSERT in function csched2_vcpu_migrate
* Moved vc->processor assignment in function csched2_vcpu_migrate to an else
block to execute only if current and destination run queues are the same;
Note: without the processor assignment here the vcpu might be assigned to a
processor it no longer is allowed to run on. In that case, function
runq_candidate may only get called for the vcpu's old processor, and
runq_candidate will no longer let a vcpu run on a processor that it's not
allowed to run on (because of the hard affinity check first introduced in
v1 of this patch).
* csched2_init: changed xzalloc_bytes to xmalloc_array for allocation of
scratch_mask
* csched2_deinit: removed scratch_mask freeing loop; it wasn't needed
Changes in v2:
* Added dynamically allocated cpu masks to avoid putting them on the stack;
replaced temp masks from v1 throughout
* Added helper function for code suggested in v1 review and called it in two
locations in function choose_cpu
* Removed v1 change to comment in the beginning of choose_cpu
* Replaced two instances of cpumask_and/cpumask_empty with cpumask_intersects
* Removed v1 re-work of code in function migrate; only change in migrate in
v2 is the assignment of a valid pcpu from the destination run queue to
vc->processor
* In function csched2_vcpu_migrate: removed change from v1 that called
function migrate even if cur and dest run queues were the same in order
to get a runq_tickle call; added processor assignment to new_cpu to fix
the real underlying issue which was the vcpu not getting a call to
sched_move_irqs
* Removed the looping added in v1 in function balance_load; may be added back
later because it would help to have balance_load be more aware of hard
affinity, but adding it does not affect credit2's current inability to
respect hard affinity.
* Removed coding style fix in function balance_load
* Improved comment in function runq_candidate
---
xen/common/sched_credit2.c | 153 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 125 insertions(+), 28 deletions(-)
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 75e0321..42a1097 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
integer_param("credit2_balance_over", opt_overload_balance_tolerance);
/*
+ * Use this to avoid having too many cpumask_t structs on the stack
+ */
+static cpumask_t **_scratch_mask = NULL;
+#define scratch_mask _scratch_mask[smp_processor_id()]
+
+/*
* Per-runqueue data
*/
struct csched2_runqueue_data {
@@ -268,6 +274,38 @@ struct csched2_dom {
uint16_t nr_vcpus;
};
+/*
+ * When a hard affinity change occurs, we may not be able to check some or
+ * all of the other runqueues for a valid new processor for the given vcpu
+ * because (in function choose_cpu) either the trylock on the private data
+ * failed or the trylock on each runqueue with valid processor(s) for the
+ * vcpu failed. In these cases, this function is used to pick a pcpu that svc
+ * can run on.
+ *
+ * Function returns a valid pcpu for svc, in order of preference:
+ * - svc's current pcpu;
+ * - another pcpu from svc's current runq;
+ * - an online pcpu in svc's domain's cpupool, and in svc's hard affinity;
+ */
+static int get_fallback_cpu(struct csched2_vcpu *svc)
+{
+ int cpu;
+
+ if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+ svc->vcpu->cpu_hard_affinity)) )
+ return svc->vcpu->processor;
+
+ cpumask_and(scratch_mask, svc->vcpu->cpu_hard_affinity,
+ &svc->rqd->active);
+ cpu = cpumask_first(scratch_mask);
+ if ( likely(cpu < nr_cpu_ids) )
+ return cpu;
+
+ cpumask_and(scratch_mask, svc->vcpu->cpu_hard_affinity,
+ VCPU2ONLINE(svc->vcpu));
+ ASSERT( !cpumask_empty(scratch_mask) );
+ return cpumask_first(scratch_mask);
+}
/*
* Time-to-credit, credit-to-time.
@@ -501,8 +539,9 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
goto tickle;
}
- /* Get a mask of idle, but not tickled */
+ /* Get a mask of idle, but not tickled, that new is allowed to run on. */
cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+ cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
/* If it's not empty, choose one */
i = cpumask_cycle(cpu, &mask);
@@ -513,9 +552,11 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
}
/* Otherwise, look for the non-idle cpu with the lowest credit,
- * skipping cpus which have been tickled but not scheduled yet */
+ * skipping cpus which have been tickled but not scheduled yet,
+ * that new is allowed to run on. */
cpumask_andnot(&mask, &rqd->active, &rqd->idle);
cpumask_andnot(&mask, &mask, &rqd->tickled);
+ cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
for_each_cpu(i, &mask)
{
@@ -1078,9 +1119,8 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
d2printk("%pv -\n", svc->vcpu);
clear_bit(__CSFLAG_runq_migrate_request, &svc->flags);
}
- /* Leave it where it is for now. When we actually pay attention
- * to affinity we'll have to figure something out... */
- return vc->processor;
+
+ return get_fallback_cpu(svc);
}
/* First check to see if we're here because someone else suggested a place
@@ -1091,45 +1131,55 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
{
printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n",
__func__);
- /* Fall-through to normal cpu pick */
}
else
{
- d2printk("%pv +\n", svc->vcpu);
- new_cpu = cpumask_cycle(vc->processor, &svc->migrate_rqd->active);
- goto out_up;
+ cpumask_and(scratch_mask, vc->cpu_hard_affinity,
+ &svc->migrate_rqd->active);
+ new_cpu = cpumask_any(scratch_mask);
+ if ( new_cpu < nr_cpu_ids )
+ {
+ d2printk("%pv +\n", svc->vcpu);
+ goto out_up;
+ }
}
+ /* Fall-through to normal cpu pick */
}
- /* FIXME: Pay attention to cpu affinity */
-
min_avgload = MAX_LOAD;
/* Find the runqueue with the lowest instantaneous load */
for_each_cpu(i, &prv->active_queues)
{
struct csched2_runqueue_data *rqd;
- s_time_t rqd_avgload;
+ s_time_t rqd_avgload = MAX_LOAD;
rqd = prv->rqd + i;
/* If checking a different runqueue, grab the lock,
- * read the avg, and then release the lock.
+ * check hard affinity, read the avg, and then release the lock.
*
* If on our own runqueue, don't grab or release the lock;
* but subtract our own load from the runqueue load to simulate
- * impartiality */
+ * impartiality.
+ *
+ * svc's hard affinity may have changed; this function is the
+ * credit 2 scheduler's first opportunity to react to the change,
+ * so it is possible here that svc does not have hard affinity
+ * with any of the pcpus of svc's currently assigned runqueue.
+ */
if ( rqd == svc->rqd )
{
- rqd_avgload = rqd->b_avgload - svc->avgload;
+ if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ rqd_avgload = rqd->b_avgload - svc->avgload;
}
else if ( spin_trylock(&rqd->lock) )
{
- rqd_avgload = rqd->b_avgload;
+ if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ rqd_avgload = rqd->b_avgload;
+
spin_unlock(&rqd->lock);
}
- else
- continue;
if ( rqd_avgload < min_avgload )
{
@@ -1138,12 +1188,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
}
}
- /* We didn't find anyone (most likely because of spinlock contention); leave it where it is */
+ /* We didn't find anyone (most likely because of spinlock contention). */
if ( min_rqi == -1 )
- new_cpu = vc->processor;
+ new_cpu = get_fallback_cpu(svc);
else
{
- new_cpu = cpumask_cycle(vc->processor, &prv->rqd[min_rqi].active);
+ cpumask_and(scratch_mask, vc->cpu_hard_affinity,
+ &prv->rqd[min_rqi].active);
+ new_cpu = cpumask_any(scratch_mask);
BUG_ON(new_cpu >= nr_cpu_ids);
}
@@ -1223,7 +1275,12 @@ static void migrate(const struct scheduler *ops,
on_runq=1;
}
__runq_deassign(svc);
- svc->vcpu->processor = cpumask_any(&trqd->active);
+
+ cpumask_and(scratch_mask, svc->vcpu->cpu_hard_affinity,
+ &trqd->active);
+ svc->vcpu->processor = cpumask_any(scratch_mask);
+ BUG_ON(svc->vcpu->processor >= nr_cpu_ids);
+
__runq_assign(svc, trqd);
if ( on_runq )
{
@@ -1237,6 +1294,17 @@ static void migrate(const struct scheduler *ops,
}
}
+/*
+ * Migration of svc to runqueue rqd is a valid option if svc is not already
+ * flagged to migrate and if svc is allowed to run on at least one of the
+ * pcpus assigned to rqd based on svc's hard affinity mask.
+ */
+static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
+ struct csched2_runqueue_data *rqd)
+{
+ return !test_bit(__CSFLAG_runq_migrate_request, &svc->flags)
+ && cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+}
static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
{
@@ -1345,8 +1413,7 @@ retry:
__update_svc_load(ops, push_svc, 0, now);
- /* Skip this one if it's already been flagged to migrate */
- if ( test_bit(__CSFLAG_runq_migrate_request, &push_svc->flags) )
+ if ( !vcpu_is_migrateable(push_svc, st.orqd) )
continue;
list_for_each( pull_iter, &st.orqd->svc )
@@ -1358,8 +1425,7 @@ retry:
__update_svc_load(ops, pull_svc, 0, now);
}
- /* Skip this one if it's already been flagged to migrate */
- if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
+ if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
continue;
consider(&st, push_svc, pull_svc);
@@ -1375,8 +1441,7 @@ retry:
{
struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
- /* Skip this one if it's already been flagged to migrate */
- if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
+ if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
continue;
/* Consider pull only */
@@ -1415,11 +1480,20 @@ csched2_vcpu_migrate(
/* Check if new_cpu is valid */
BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
+ ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
trqd = RQD(ops, new_cpu);
+ /*
+ * Without the processor assignment after the else, vc may be assigned to
+ * a processor it is not allowed to run on. In that case, runq_candidate
+ * might only get called for the old cpu, and vc will not get to run due
+ * to the hard affinity check.
+ */
if ( trqd != svc->rqd )
migrate(ops, svc, trqd, NOW());
+ else
+ vc->processor = new_cpu;
}
static int
@@ -1638,6 +1712,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
{
struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
+ /* Only consider vcpus that are allowed to run on this processor. */
+ if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+ continue;
+
/* If this is on a different processor, don't pull it unless
* its credit is at least CSCHED2_MIGRATE_RESIST higher. */
if ( svc->vcpu->processor != cpu
@@ -2047,6 +2125,9 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
spin_unlock_irqrestore(&prv->lock, flags);
+ free_cpumask_var(_scratch_mask[cpu]);
+ _scratch_mask[cpu] = NULL;
+
return;
}
@@ -2061,6 +2142,16 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
printk("%s: cpu %d not online yet, deferring initializatgion\n",
__func__, cpu);
+ /*
+ * For each new pcpu, allocate a cpumask_t for use throughout the
+ * scheduler to avoid putting any cpumask_t structs on the stack.
+ */
+ if ( !zalloc_cpumask_var(&_scratch_mask[cpu]) )
+ {
+ ASSERT(_scratch_mask[cpu] == NULL);
+ return NULL;
+ }
+
return (void *)1;
}
@@ -2151,6 +2242,10 @@ static struct notifier_block cpu_credit2_nfb = {
static int
csched2_global_init(void)
{
+ _scratch_mask = xzalloc_array(cpumask_t *, nr_cpu_ids);
+ if ( _scratch_mask == NULL )
+ return -ENOMEM;
+
register_cpu_notifier(&cpu_credit2_nfb);
return 0;
}
@@ -2206,6 +2301,8 @@ csched2_deinit(const struct scheduler *ops)
prv = CSCHED2_PRIV(ops);
xfree(prv);
+
+ xfree(_scratch_mask);
}
--
1.7.10.4
next prev parent reply other threads:[~2015-07-13 8:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-13 8:13 [PATCH v4 0/5] sched: credit2: introduce per-vcpu hard and soft affinity Justin T. Weaver
2015-07-13 8:13 ` [PATCH v4 1/5] sched: factor out VCPU2ONLINE to common header file Justin T. Weaver
2015-09-17 15:26 ` Dario Faggioli
2015-07-13 8:13 ` Justin T. Weaver [this message]
2015-09-18 22:12 ` [PATCH v4 2/5] sched: credit2: respect per-vcpu hard affinity Dario Faggioli
2015-07-13 8:13 ` [PATCH v4 3/5] sched: factor out per-vcpu affinity related code to common header file Justin T. Weaver
2015-07-13 8:13 ` [PATCH v4 4/5] sched: credit2: add soft affinity awareness to function get_fallback_cpu Justin T. Weaver
2015-07-13 8:13 ` [PATCH v4 5/5] sched: credit2: add soft affinity awareness to function runq_tickle Justin T. Weaver
2015-07-13 15:43 ` [PATCH v4 0/5] sched: credit2: introduce per-vcpu hard and soft affinity Dario Faggioli
2015-09-14 9:03 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1436775223-6397-3-git-send-email-jtweaver@hawaii.edu \
--to=jtweaver@hawaii.edu \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=henric@hawaii.edu \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).