* Re: [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing
[not found] <E1Fjbdr-0005cU-OD@xenbits.xensource.com>
@ 2006-05-26 14:00 ` Matt Ayres
2006-05-26 14:15 ` Keir Fraser
2006-05-26 15:51 ` Anthony Liguori
1 sibling, 1 reply; 4+ messages in thread
From: Matt Ayres @ 2006-05-26 14:00 UTC (permalink / raw)
To: xen-devel
Don't you think things like this should be committed with a readme file?
Xen patchbot-unstable wrote:
> # HG changeset patch
> # User ack@kneesa.uk.xensource.com
> # Node ID e539abd27a0f2b1a64b4d129a10748d50c93e6fb
> # Parent b6937b93141961b67dc642581266e6fc2015bc91
> New weighted fair-share CPU scheduler w/ automatic SMP load balancing
> Signed-off-by: Emmanuel Ackaouy <ack@xensource.com>
> ---
> tools/libxc/Makefile | 1
> tools/libxc/xc_csched.c | 50 +
> tools/libxc/xenctrl.h | 8
> tools/python/xen/lowlevel/xc/xc.c | 61 +
> tools/python/xen/xend/XendDomain.py | 22
> tools/python/xen/xend/server/SrvDomain.py | 14
> tools/python/xen/xm/main.py | 45 +
> xen/common/Makefile | 1
> xen/common/sched_credit.c | 1233 ++++++++++++++++++++++++++++++
> xen/common/schedule.c | 5
> xen/include/public/sched_ctl.h | 5
> xen/include/xen/sched-if.h | 2
> xen/include/xen/softirq.h | 13
> 13 files changed, 1460 insertions(+)
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing
2006-05-26 14:00 ` [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing Matt Ayres
@ 2006-05-26 14:15 ` Keir Fraser
0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2006-05-26 14:15 UTC (permalink / raw)
To: Matt Ayres; +Cc: xen-devel
> Don't you think things like this should be committed with a readme file?
We'll make sure the updated docs are committed before we make this the
default scheduler.
-- Keir
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing
[not found] <E1Fjbdr-0005cU-OD@xenbits.xensource.com>
2006-05-26 14:00 ` [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing Matt Ayres
@ 2006-05-26 15:51 ` Anthony Liguori
2006-05-26 20:13 ` Ewan Mellor
1 sibling, 1 reply; 4+ messages in thread
From: Anthony Liguori @ 2006-05-26 15:51 UTC (permalink / raw)
To: xen-devel; +Cc: Emmanuel Ackaouy
Just some random feedback.
Xen patchbot-unstable wrote:
>
> +static PyObject *pyxc_csched_domain_set(XcObject *self,
> + PyObject *args,
> + PyObject *kwds)
> +{
> + uint32_t domid;
> + uint16_t weight;
> + uint16_t cap;
> + static char *kwd_list[] = { "dom", "weight", "cap", NULL };
> + static char kwd_type[] = "I|HH";
> + struct csched_domain sdom;
> +
> + weight = 0;
> + cap = (uint16_t)~0U;
> + if( !PyArg_ParseTupleAndKeywords(args, kwds, kwd_type, kwd_list,
> + &domid, &weight, &cap) )
> + return NULL;
> +
> + sdom.weight = weight;
> + sdom.cap = cap;
> +
> + if ( xc_csched_domain_set(self->xc_handle, domid, &sdom) != 0 )
> + return PyErr_SetFromErrno(xc_error);
> +
> + Py_INCREF(zero);
> + return zero;
> +}
>
It's always seemed odd that we return zero here instead of Py_RETURN_NONE.
>
> + def domain_csched_get(self, domid):
> + """Get credit scheduler parameters for a domain.
> + """
> + dominfo = self.domain_lookup_by_name_or_id_nr(domid)
> + if not dominfo:
> + raise XendInvalidDomain(str(domid))
> + try:
> + return xc.csched_domain_get(dominfo.getDomid())
> + except Exception, ex:
> + raise XendError(str(ex))
> +
> + def domain_csched_set(self, domid, weight, cap):
> + """Set credit scheduler parameters for a domain.
> + """
> + dominfo = self.domain_lookup_by_name_or_id_nr(domid)
> + if not dominfo:
> + raise XendInvalidDomain(str(domid))
> + try:
> + return xc.csched_domain_set(dominfo.getDomid(), weight, cap)
> + except Exception, ex:
> + raise XendError(str(ex))
> +
>
Please don't catch Exception. The XML-RPC now properly propagates all
exceptions so there's no need to rewrap things in XendError. Just let
the normal exception propagate.
> diff -r b6937b931419 -r e539abd27a0f tools/python/xen/xm/main.py
> --- a/tools/python/xen/xm/main.py Fri May 26 09:44:29 2006 +0100
> +++ b/tools/python/xen/xm/main.py Fri May 26 11:14:36 2006 +0100
> @@ -99,6 +99,7 @@ sched_sedf_help = "sched-sedf [DOM] [OPT
> specifies another way of setting a domain's\n\
> cpu period/slice."
>
> +csched_help = "csched Set or get credit scheduler parameters"
> block_attach_help = """block-attach <DomId> <BackDev> <FrontDev> <Mode>
> [BackDomId] Create a new virtual block device"""
> block_detach_help = """block-detach <DomId> <DevId> Destroy a domain's virtual block device,
> @@ -174,6 +175,7 @@ host_commands = [
> ]
>
> scheduler_commands = [
> + "csched",
> "sched-bvt",
> "sched-bvt-ctxallow",
> "sched-sedf",
> @@ -735,6 +737,48 @@ def xm_sched_sedf(args):
> else:
> print_sedf(sedf_info)
Seem to be breaking naming convention here. sched-csched may seem
redundant but that's what you get for choosing a non-descriptive name
for the scheduler in the first place ;-)
>
> +def xm_csched(args):
> + usage_msg = """Csched: Set or get credit scheduler parameters
> + Usage:
> +
> + csched -d domain [-w weight] [-c cap]
> + """
>
<snip>
> +
> + if weight is None and cap is None:
> + print server.xend.domain.csched_get(domain)
>
Do we really want to print out a dict? This should be pretty printed.
> + else:
> + if weight is None:
> + weight = int(0)
> + if cap is None:
> + cap = int(~0)
>
>
Are these casts strictly necessary?
> + err = server.xend.domain.csched_set(domain, weight, cap)
> + if err != 0:
> + print err
>
From the functions, won't it only ever return 0 or throw an exception?
This check doesn't seem necessary. I think it would be a lot better to
just have the function return None on success and let the exception
handling in main.py deal with errors.
In the very least, printing err doesn't help much :-)
>
> def xm_info(args):
> arg_check(args, "info", 0)
> @@ -1032,6 +1076,7 @@ commands = {
> "sched-bvt": xm_sched_bvt,
> "sched-bvt-ctxallow": xm_sched_bvt_ctxallow,
> "sched-sedf": xm_sched_sedf,
> + "csched": xm_csched,
> # block
> "block-attach": xm_block_attach,
> "block-detach": xm_block_detach,
>
I'll look at the rest later. Aside from the minor nitpicks, it all
looks pretty sane.
Regards,
Anthony Liguori
> diff -r b6937b931419 -r e539abd27a0f xen/common/Makefile
> --- a/xen/common/Makefile Fri May 26 09:44:29 2006 +0100
> +++ b/xen/common/Makefile Fri May 26 11:14:36 2006 +0100
> @@ -13,6 +13,7 @@ obj-y += page_alloc.o
> obj-y += page_alloc.o
> obj-y += rangeset.o
> obj-y += sched_bvt.o
> +obj-y += sched_credit.o
> obj-y += sched_sedf.o
> obj-y += schedule.o
> obj-y += softirq.o
> diff -r b6937b931419 -r e539abd27a0f xen/common/schedule.c
> --- a/xen/common/schedule.c Fri May 26 09:44:29 2006 +0100
> +++ b/xen/common/schedule.c Fri May 26 11:14:36 2006 +0100
> @@ -50,9 +50,11 @@ struct schedule_data schedule_data[NR_CP
>
> extern struct scheduler sched_bvt_def;
> extern struct scheduler sched_sedf_def;
> +extern struct scheduler sched_credit_def;
> static struct scheduler *schedulers[] = {
> &sched_bvt_def,
> &sched_sedf_def,
> + &sched_credit_def,
> NULL
> };
>
> @@ -639,6 +641,8 @@ static void t_timer_fn(void *unused)
>
> page_scrub_schedule_work();
>
> + SCHED_OP(tick, cpu);
> +
> set_timer(&t_timer[cpu], NOW() + MILLISECS(10));
> }
>
> @@ -681,6 +685,7 @@ void __init scheduler_init(void)
> printk("Could not find scheduler: %s\n", opt_sched);
>
> printk("Using scheduler: %s (%s)\n", ops.name, ops.opt_name);
> + SCHED_OP(init);
>
> if ( idle_vcpu[0] != NULL )
> {
> diff -r b6937b931419 -r e539abd27a0f xen/include/public/sched_ctl.h
> --- a/xen/include/public/sched_ctl.h Fri May 26 09:44:29 2006 +0100
> +++ b/xen/include/public/sched_ctl.h Fri May 26 11:14:36 2006 +0100
> @@ -10,6 +10,7 @@
> /* Scheduler types. */
> #define SCHED_BVT 0
> #define SCHED_SEDF 4
> +#define SCHED_CREDIT 5
>
> /* Set or get info? */
> #define SCHED_INFO_PUT 0
> @@ -48,6 +49,10 @@ struct sched_adjdom_cmd {
> uint32_t extratime;
> uint32_t weight;
> } sedf;
> + struct csched_domain {
> + uint16_t weight;
> + uint16_t cap;
> + } credit;
> } u;
> };
>
> diff -r b6937b931419 -r e539abd27a0f xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h Fri May 26 09:44:29 2006 +0100
> +++ b/xen/include/xen/sched-if.h Fri May 26 11:14:36 2006 +0100
> @@ -58,6 +58,8 @@ struct scheduler {
> char *opt_name; /* option name for this scheduler */
> unsigned int sched_id; /* ID for this scheduler */
>
> + void (*init) (void);
> + void (*tick) (unsigned int cpu);
> int (*alloc_task) (struct vcpu *);
> void (*add_task) (struct vcpu *);
> void (*free_task) (struct domain *);
> diff -r b6937b931419 -r e539abd27a0f xen/include/xen/softirq.h
> --- a/xen/include/xen/softirq.h Fri May 26 09:44:29 2006 +0100
> +++ b/xen/include/xen/softirq.h Fri May 26 11:14:36 2006 +0100
> @@ -26,6 +26,19 @@ asmlinkage void do_softirq(void);
> asmlinkage void do_softirq(void);
> extern void open_softirq(int nr, softirq_handler handler);
>
> +static inline void cpumask_raise_softirq(cpumask_t mask, unsigned int nr)
> +{
> + int cpu;
> +
> + for_each_cpu_mask(cpu, mask)
> + {
> + if ( test_and_set_bit(nr, &softirq_pending(cpu)) )
> + cpu_clear(cpu, mask);
> + }
> +
> + smp_send_event_check_mask(mask);
> +}
> +
> static inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
> {
> if ( !test_and_set_bit(nr, &softirq_pending(cpu)) )
> diff -r b6937b931419 -r e539abd27a0f tools/libxc/xc_csched.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxc/xc_csched.c Fri May 26 11:14:36 2006 +0100
> @@ -0,0 +1,50 @@
> +/****************************************************************************
> + * (C) 2006 - Emmanuel Ackaouy - XenSource Inc.
> + ****************************************************************************
> + *
> + * File: xc_csched.c
> + * Author: Emmanuel Ackaouy
> + *
> + * Description: XC Interface to the credit scheduler
> + *
> + */
> +#include "xc_private.h"
> +
> +
> +int
> +xc_csched_domain_set(
> + int xc_handle,
> + uint32_t domid,
> + struct csched_domain *sdom)
> +{
> + DECLARE_DOM0_OP;
> +
> + op.cmd = DOM0_ADJUSTDOM;
> + op.u.adjustdom.domain = (domid_t) domid;
> + op.u.adjustdom.sched_id = SCHED_CREDIT;
> + op.u.adjustdom.direction = SCHED_INFO_PUT;
> + op.u.adjustdom.u.credit = *sdom;
> +
> + return do_dom0_op(xc_handle, &op);
> +}
> +
> +int
> +xc_csched_domain_get(
> + int xc_handle,
> + uint32_t domid,
> + struct csched_domain *sdom)
> +{
> + DECLARE_DOM0_OP;
> + int err;
> +
> + op.cmd = DOM0_ADJUSTDOM;
> + op.u.adjustdom.domain = (domid_t) domid;
> + op.u.adjustdom.sched_id = SCHED_CREDIT;
> + op.u.adjustdom.direction = SCHED_INFO_GET;
> +
> + err = do_dom0_op(xc_handle, &op);
> + if ( err == 0 )
> + *sdom = op.u.adjustdom.u.credit;
> +
> + return err;
> +}
> diff -r b6937b931419 -r e539abd27a0f xen/common/sched_credit.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/xen/common/sched_credit.c Fri May 26 11:14:36 2006 +0100
> @@ -0,0 +1,1233 @@
> +/****************************************************************************
> + * (C) 2005-2006 - Emmanuel Ackaouy - XenSource Inc.
> + ****************************************************************************
> + *
> + * File: common/csched_credit.c
> + * Author: Emmanuel Ackaouy
> + *
> + * Description: Credit-based SMP CPU scheduler
> + */
> +
> +#include <xen/config.h>
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/sched.h>
> +#include <xen/domain.h>
> +#include <xen/delay.h>
> +#include <xen/event.h>
> +#include <xen/time.h>
> +#include <xen/perfc.h>
> +#include <xen/sched-if.h>
> +#include <xen/softirq.h>
> +#include <asm/atomic.h>
> +
> +
> +/*
> + * CSCHED_STATS
> + *
> + * Manage very basic counters and stats.
> + *
> + * Useful for debugging live systems. The stats are displayed
> + * with runq dumps ('r' on the Xen console).
> + */
> +#define CSCHED_STATS
> +
> +
> +/*
> + * Basic constants
> + */
> +#define CSCHED_TICK 10 /* milliseconds */
> +#define CSCHED_TSLICE 30 /* milliseconds */
> +#define CSCHED_ACCT_NTICKS 3
> +#define CSCHED_ACCT_PERIOD (CSCHED_ACCT_NTICKS * CSCHED_TICK)
> +#define CSCHED_DEFAULT_WEIGHT 256
> +
> +
> +/*
> + * Priorities
> + */
> +#define CSCHED_PRI_TS_UNDER -1 /* time-share w/ credits */
> +#define CSCHED_PRI_TS_OVER -2 /* time-share w/o credits */
> +#define CSCHED_PRI_IDLE -64 /* idle */
> +#define CSCHED_PRI_TS_PARKED -65 /* time-share w/ capped credits */
> +
> +
> +/*
> + * Useful macros
> + */
> +#define CSCHED_PCPU(_c) ((struct csched_pcpu *)schedule_data[_c].sched_priv)
> +#define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
> +#define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
> +#define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
> +
> +
> +/*
> + * Stats
> + */
> +#ifdef CSCHED_STATS
> +
> +#define CSCHED_STAT(_X) (csched_priv.stats._X)
> +#define CSCHED_STAT_DEFINE(_X) uint32_t _X;
> +#define CSCHED_STAT_PRINTK(_X) \
> + do \
> + { \
> + printk("\t%-30s = %u\n", #_X, CSCHED_STAT(_X)); \
> + } while ( 0 );
> +
> +#define CSCHED_STATS_EXPAND_SCHED(_MACRO) \
> + _MACRO(vcpu_alloc) \
> + _MACRO(vcpu_add) \
> + _MACRO(vcpu_sleep) \
> + _MACRO(vcpu_wake_running) \
> + _MACRO(vcpu_wake_onrunq) \
> + _MACRO(vcpu_wake_runnable) \
> + _MACRO(vcpu_wake_not_runnable) \
> + _MACRO(dom_free) \
> + _MACRO(schedule) \
> + _MACRO(tickle_local_idler) \
> + _MACRO(tickle_local_over) \
> + _MACRO(tickle_local_under) \
> + _MACRO(tickle_local_other) \
> + _MACRO(acct_run) \
> + _MACRO(acct_no_work) \
> + _MACRO(acct_balance) \
> + _MACRO(acct_reorder) \
> + _MACRO(acct_min_credit) \
> + _MACRO(acct_vcpu_active) \
> + _MACRO(acct_vcpu_idle) \
> + _MACRO(acct_vcpu_credit_min)
> +
> +#define CSCHED_STATS_EXPAND_SMP_LOAD_BALANCE(_MACRO) \
> + _MACRO(vcpu_migrate) \
> + _MACRO(load_balance_idle) \
> + _MACRO(load_balance_over) \
> + _MACRO(load_balance_other) \
> + _MACRO(steal_trylock_failed) \
> + _MACRO(steal_peer_down) \
> + _MACRO(steal_peer_idle) \
> + _MACRO(steal_peer_running) \
> + _MACRO(steal_peer_pinned) \
> + _MACRO(tickle_idlers_none) \
> + _MACRO(tickle_idlers_some)
> +
> +#ifndef NDEBUG
> +#define CSCHED_STATS_EXPAND_CHECKS(_MACRO) \
> + _MACRO(vcpu_check)
> +#else
> +#define CSCHED_STATS_EXPAND_CHECKS(_MACRO)
> +#endif
> +
> +#define CSCHED_STATS_EXPAND(_MACRO) \
> + CSCHED_STATS_EXPAND_SCHED(_MACRO) \
> + CSCHED_STATS_EXPAND_SMP_LOAD_BALANCE(_MACRO) \
> + CSCHED_STATS_EXPAND_CHECKS(_MACRO)
> +
> +#define CSCHED_STATS_RESET() \
> + do \
> + { \
> + memset(&csched_priv.stats, 0, sizeof(csched_priv.stats)); \
> + } while ( 0 )
> +
> +#define CSCHED_STATS_DEFINE() \
> + struct \
> + { \
> + CSCHED_STATS_EXPAND(CSCHED_STAT_DEFINE) \
> + } stats
> +
> +#define CSCHED_STATS_PRINTK() \
> + do \
> + { \
> + printk("stats:\n"); \
> + CSCHED_STATS_EXPAND(CSCHED_STAT_PRINTK) \
> + } while ( 0 )
> +
> +#define CSCHED_STAT_CRANK(_X) (CSCHED_STAT(_X)++)
> +
> +#else /* CSCHED_STATS */
> +
> +#define CSCHED_STATS_RESET() do {} while ( 0 )
> +#define CSCHED_STATS_DEFINE() do {} while ( 0 )
> +#define CSCHED_STATS_PRINTK() do {} while ( 0 )
> +#define CSCHED_STAT_CRANK(_X) do {} while ( 0 )
> +
> +#endif /* CSCHED_STATS */
> +
> +
> +/*
> + * Physical CPU
> + */
> +struct csched_pcpu {
> + struct list_head runq;
> + uint32_t runq_sort_last;
> +};
> +
> +/*
> + * Virtual CPU
> + */
> +struct csched_vcpu {
> + struct list_head runq_elem;
> + struct list_head active_vcpu_elem;
> + struct csched_dom *sdom;
> + struct vcpu *vcpu;
> + atomic_t credit;
> + int credit_last;
> + uint32_t credit_incr;
> + uint32_t state_active;
> + uint32_t state_idle;
> + int16_t pri;
> +};
> +
> +/*
> + * Domain
> + */
> +struct csched_dom {
> + struct list_head active_vcpu;
> + struct list_head active_sdom_elem;
> + struct domain *dom;
> + uint16_t active_vcpu_count;
> + uint16_t weight;
> + uint16_t cap;
> +};
> +
> +/*
> + * System-wide private data
> + */
> +struct csched_private {
> + spinlock_t lock;
> + struct list_head active_sdom;
> + uint32_t ncpus;
> + unsigned int master;
> + cpumask_t idlers;
> + uint32_t weight;
> + uint32_t credit;
> + int credit_balance;
> + uint32_t runq_sort;
> + CSCHED_STATS_DEFINE();
> +};
> +
> +
> +/*
> + * Global variables
> + */
> +static struct csched_private csched_priv;
> +
> +
> +
> +static inline int
> +__vcpu_on_runq(struct csched_vcpu *svc)
> +{
> + return !list_empty(&svc->runq_elem);
> +}
> +
> +static inline struct csched_vcpu *
> +__runq_elem(struct list_head *elem)
> +{
> + return list_entry(elem, struct csched_vcpu, runq_elem);
> +}
> +
> +static inline void
> +__runq_insert(unsigned int cpu, struct csched_vcpu *svc)
> +{
> + const struct list_head * const runq = RUNQ(cpu);
> + struct list_head *iter;
> +
> + BUG_ON( __vcpu_on_runq(svc) );
> + BUG_ON( cpu != svc->vcpu->processor );
> +
> + list_for_each( iter, runq )
> + {
> + const struct csched_vcpu * const iter_svc = __runq_elem(iter);
> + if ( svc->pri > iter_svc->pri )
> + break;
> + }
> +
> + list_add_tail(&svc->runq_elem, iter);
> +}
> +
> +static inline void
> +__runq_remove(struct csched_vcpu *svc)
> +{
> + BUG_ON( !__vcpu_on_runq(svc) );
> + list_del_init(&svc->runq_elem);
> +}
> +
> +static inline void
> +__runq_tickle(unsigned int cpu, struct csched_vcpu *new)
> +{
> + struct csched_vcpu * const cur = CSCHED_VCPU(schedule_data[cpu].curr);
> + cpumask_t mask;
> +
> + ASSERT(cur);
> + cpus_clear(mask);
> +
> + /* If strictly higher priority than current VCPU, signal the CPU */
> + if ( new->pri > cur->pri )
> + {
> + if ( cur->pri == CSCHED_PRI_IDLE )
> + CSCHED_STAT_CRANK(tickle_local_idler);
> + else if ( cur->pri == CSCHED_PRI_TS_OVER )
> + CSCHED_STAT_CRANK(tickle_local_over);
> + else if ( cur->pri == CSCHED_PRI_TS_UNDER )
> + CSCHED_STAT_CRANK(tickle_local_under);
> + else
> + CSCHED_STAT_CRANK(tickle_local_other);
> +
> + cpu_set(cpu, mask);
> + }
> +
> + /*
> + * If this CPU has at least two runnable VCPUs, we tickle any idlers to
> + * let them know there is runnable work in the system...
> + */
> + if ( cur->pri > CSCHED_PRI_IDLE )
> + {
> + if ( cpus_empty(csched_priv.idlers) )
> + {
> + CSCHED_STAT_CRANK(tickle_idlers_none);
> + }
> + else
> + {
> + CSCHED_STAT_CRANK(tickle_idlers_some);
> + cpus_or(mask, mask, csched_priv.idlers);
> + }
> + }
> +
> + /* Send scheduler interrupts to designated CPUs */
> + if ( !cpus_empty(mask) )
> + cpumask_raise_softirq(mask, SCHEDULE_SOFTIRQ);
> +}
> +
> +static void
> +csched_pcpu_init(int cpu)
> +{
> + struct csched_pcpu *spc;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&csched_priv.lock, flags);
> +
> + /* Initialize/update system-wide config */
> + csched_priv.credit += CSCHED_ACCT_PERIOD;
> + if ( csched_priv.ncpus <= cpu )
> + csched_priv.ncpus = cpu + 1;
> + if ( csched_priv.master >= csched_priv.ncpus )
> + csched_priv.master = cpu;
> +
> + /* Allocate per-PCPU info */
> + spc = xmalloc(struct csched_pcpu);
> + BUG_ON( spc == NULL );
> + INIT_LIST_HEAD(&spc->runq);
> + spc->runq_sort_last = csched_priv.runq_sort;
> + schedule_data[cpu].sched_priv = spc;
> +
> + /* Start off idling... */
> + BUG_ON( !is_idle_vcpu(schedule_data[cpu].curr) );
> + cpu_set(cpu, csched_priv.idlers);
> +
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> +}
> +
> +#ifndef NDEBUG
> +static inline void
> +__csched_vcpu_check(struct vcpu *vc)
> +{
> + struct csched_vcpu * const svc = CSCHED_VCPU(vc);
> + struct csched_dom * const sdom = svc->sdom;
> +
> + BUG_ON( svc->vcpu != vc );
> + BUG_ON( sdom != CSCHED_DOM(vc->domain) );
> + if ( sdom )
> + {
> + BUG_ON( is_idle_vcpu(vc) );
> + BUG_ON( sdom->dom != vc->domain );
> + }
> + else
> + {
> + BUG_ON( !is_idle_vcpu(vc) );
> + }
> +
> + CSCHED_STAT_CRANK(vcpu_check);
> +}
> +#define CSCHED_VCPU_CHECK(_vc) (__csched_vcpu_check(_vc))
> +#else
> +#define CSCHED_VCPU_CHECK(_vc)
> +#endif
> +
> +static inline int
> +__csched_vcpu_is_stealable(int local_cpu, struct vcpu *vc)
> +{
> + /*
> + * Don't pick up work that's in the peer's scheduling tail. Also only pick
> + * up work that's allowed to run on our CPU.
> + */
> + if ( unlikely(test_bit(_VCPUF_running, &vc->vcpu_flags)) )
> + {
> + CSCHED_STAT_CRANK(steal_peer_running);
> + return 0;
> + }
> +
> + if ( unlikely(!cpu_isset(local_cpu, vc->cpu_affinity)) )
> + {
> + CSCHED_STAT_CRANK(steal_peer_pinned);
> + return 0;
> + }
> +
> + return 1;
> +}
> +
> +static void
> +csched_vcpu_acct(struct csched_vcpu *svc, int credit_dec)
> +{
> + struct csched_dom * const sdom = svc->sdom;
> + unsigned long flags;
> +
> + /* Update credits */
> + atomic_sub(credit_dec, &svc->credit);
> +
> + /* Put this VCPU and domain back on the active list if it was idling */
> + if ( list_empty(&svc->active_vcpu_elem) )
> + {
> + spin_lock_irqsave(&csched_priv.lock, flags);
> +
> + if ( list_empty(&svc->active_vcpu_elem) )
> + {
> + CSCHED_STAT_CRANK(acct_vcpu_active);
> + svc->state_active++;
> +
> + sdom->active_vcpu_count++;
> + list_add(&svc->active_vcpu_elem, &sdom->active_vcpu);
> + if ( list_empty(&sdom->active_sdom_elem) )
> + {
> + list_add(&sdom->active_sdom_elem, &csched_priv.active_sdom);
> + csched_priv.weight += sdom->weight;
> + }
> + }
> +
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> + }
> +}
> +
> +static inline void
> +__csched_vcpu_acct_idle_locked(struct csched_vcpu *svc)
> +{
> + struct csched_dom * const sdom = svc->sdom;
> +
> + BUG_ON( list_empty(&svc->active_vcpu_elem) );
> +
> + CSCHED_STAT_CRANK(acct_vcpu_idle);
> + svc->state_idle++;
> +
> + sdom->active_vcpu_count--;
> + list_del_init(&svc->active_vcpu_elem);
> + if ( list_empty(&sdom->active_vcpu) )
> + {
> + BUG_ON( csched_priv.weight < sdom->weight );
> + list_del_init(&sdom->active_sdom_elem);
> + csched_priv.weight -= sdom->weight;
> + }
> +
> + atomic_set(&svc->credit, 0);
> +}
> +
> +static int
> +csched_vcpu_alloc(struct vcpu *vc)
> +{
> + struct domain * const dom = vc->domain;
> + struct csched_dom *sdom;
> + struct csched_vcpu *svc;
> + int16_t pri;
> +
> + CSCHED_STAT_CRANK(vcpu_alloc);
> +
> + /* Allocate, if appropriate, per-domain info */
> + if ( is_idle_vcpu(vc) )
> + {
> + sdom = NULL;
> + pri = CSCHED_PRI_IDLE;
> + }
> + else if ( CSCHED_DOM(dom) )
> + {
> + sdom = CSCHED_DOM(dom);
> + pri = CSCHED_PRI_TS_UNDER;
> + }
> + else
> + {
> + sdom = xmalloc(struct csched_dom);
> + if ( !sdom )
> + return -1;
> +
> + /* Initialize credit and weight */
> + INIT_LIST_HEAD(&sdom->active_vcpu);
> + sdom->active_vcpu_count = 0;
> + INIT_LIST_HEAD(&sdom->active_sdom_elem);
> + sdom->dom = dom;
> + sdom->weight = CSCHED_DEFAULT_WEIGHT;
> + sdom->cap = 0U;
> + dom->sched_priv = sdom;
> + pri = CSCHED_PRI_TS_UNDER;
> + }
> +
> + /* Allocate per-VCPU info */
> + svc = xmalloc(struct csched_vcpu);
> + if ( !svc )
> + return -1;
> +
> + INIT_LIST_HEAD(&svc->runq_elem);
> + INIT_LIST_HEAD(&svc->active_vcpu_elem);
> + svc->sdom = sdom;
> + svc->vcpu = vc;
> + atomic_set(&svc->credit, 0);
> + svc->credit_last = 0;
> + svc->credit_incr = 0U;
> + svc->state_active = 0U;
> + svc->state_idle = 0U;
> + svc->pri = pri;
> + vc->sched_priv = svc;
> +
> + CSCHED_VCPU_CHECK(vc);
> +
> + /* Attach fair-share VCPUs to the accounting list */
> + if ( likely(sdom != NULL) )
> + csched_vcpu_acct(svc, 0);
> +
> + return 0;
> +}
> +
> +static void
> +csched_vcpu_add(struct vcpu *vc)
> +{
> + CSCHED_STAT_CRANK(vcpu_add);
> +
> + /* Allocate per-PCPU info */
> + if ( unlikely(!CSCHED_PCPU(vc->processor)) )
> + csched_pcpu_init(vc->processor);
> +
> + CSCHED_VCPU_CHECK(vc);
> +}
> +
> +static void
> +csched_vcpu_free(struct vcpu *vc)
> +{
> + struct csched_vcpu * const svc = CSCHED_VCPU(vc);
> + struct csched_dom * const sdom = svc->sdom;
> + unsigned long flags;
> +
> + BUG_ON( sdom == NULL );
> + BUG_ON( !list_empty(&svc->runq_elem) );
> +
> + spin_lock_irqsave(&csched_priv.lock, flags);
> +
> + if ( !list_empty(&svc->active_vcpu_elem) )
> + __csched_vcpu_acct_idle_locked(svc);
> +
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> +
> + xfree(svc);
> +}
> +
> +static void
> +csched_vcpu_sleep(struct vcpu *vc)
> +{
> + struct csched_vcpu * const svc = CSCHED_VCPU(vc);
> +
> + CSCHED_STAT_CRANK(vcpu_sleep);
> +
> + BUG_ON( is_idle_vcpu(vc) );
> +
> + if ( schedule_data[vc->processor].curr == vc )
> + cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
> + else if ( __vcpu_on_runq(svc) )
> + __runq_remove(svc);
> +}
> +
> +static void
> +csched_vcpu_wake(struct vcpu *vc)
> +{
> + struct csched_vcpu * const svc = CSCHED_VCPU(vc);
> + const unsigned int cpu = vc->processor;
> +
> + BUG_ON( is_idle_vcpu(vc) );
> +
> + if ( unlikely(schedule_data[cpu].curr == vc) )
> + {
> + CSCHED_STAT_CRANK(vcpu_wake_running);
> + return;
> + }
> + if ( unlikely(__vcpu_on_runq(svc)) )
> + {
> + CSCHED_STAT_CRANK(vcpu_wake_onrunq);
> + return;
> + }
> +
> + if ( likely(vcpu_runnable(vc)) )
> + CSCHED_STAT_CRANK(vcpu_wake_runnable);
> + else
> + CSCHED_STAT_CRANK(vcpu_wake_not_runnable);
> +
> + /* Put the VCPU on the runq and tickle CPUs */
> + __runq_insert(cpu, svc);
> + __runq_tickle(cpu, svc);
> +}
> +
> +static int
> +csched_vcpu_set_affinity(struct vcpu *vc, cpumask_t *affinity)
> +{
> + unsigned long flags;
> + int lcpu;
> +
> + if ( vc == current )
> + {
> + /* No locking needed but also can't move on the spot... */
> + if ( !cpu_isset(vc->processor, *affinity) )
> + return -EBUSY;
> +
> + vc->cpu_affinity = *affinity;
> + }
> + else
> + {
> + /* Pause, modify, and unpause. */
> + vcpu_pause(vc);
> +
> + vc->cpu_affinity = *affinity;
> + if ( !cpu_isset(vc->processor, vc->cpu_affinity) )
> + {
> + /*
> + * We must grab the scheduler lock for the CPU currently owning
> + * this VCPU before changing its ownership.
> + */
> + vcpu_schedule_lock_irqsave(vc, flags);
> + lcpu = vc->processor;
> +
> + vc->processor = first_cpu(vc->cpu_affinity);
> +
> + spin_unlock_irqrestore(&schedule_data[lcpu].schedule_lock, flags);
> + }
> +
> + vcpu_unpause(vc);
> + }
> +
> + return 0;
> +}
> +
> +static int
> +csched_dom_cntl(
> + struct domain *d,
> + struct sched_adjdom_cmd *cmd)
> +{
> + struct csched_dom * const sdom = CSCHED_DOM(d);
> + unsigned long flags;
> +
> + if ( cmd->direction == SCHED_INFO_GET )
> + {
> + cmd->u.credit.weight = sdom->weight;
> + cmd->u.credit.cap = sdom->cap;
> + }
> + else
> + {
> + ASSERT( cmd->direction == SCHED_INFO_PUT );
> +
> + spin_lock_irqsave(&csched_priv.lock, flags);
> +
> + if ( cmd->u.credit.weight != 0 )
> + {
> + csched_priv.weight -= sdom->weight;
> + sdom->weight = cmd->u.credit.weight;
> + csched_priv.weight += sdom->weight;
> + }
> +
> + if ( cmd->u.credit.cap != (uint16_t)~0U )
> + sdom->cap = cmd->u.credit.cap;
> +
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> + }
> +
> + return 0;
> +}
> +
> +static void
> +csched_dom_free(struct domain *dom)
> +{
> + struct csched_dom * const sdom = CSCHED_DOM(dom);
> + int i;
> +
> + CSCHED_STAT_CRANK(dom_free);
> +
> + for ( i = 0; i < MAX_VIRT_CPUS; i++ )
> + {
> + if ( dom->vcpu[i] )
> + csched_vcpu_free(dom->vcpu[i]);
> + }
> +
> + xfree(sdom);
> +}
> +
> +/*
> + * This is a O(n) optimized sort of the runq.
> + *
> + * Time-share VCPUs can only be one of two priorities, UNDER or OVER. We walk
> + * through the runq and move up any UNDERs that are preceded by OVERS. We
> + * remember the last UNDER to make the move up operation O(1).
> + */
> +static void
> +csched_runq_sort(unsigned int cpu)
> +{
> + struct csched_pcpu * const spc = CSCHED_PCPU(cpu);
> + struct list_head *runq, *elem, *next, *last_under;
> + struct csched_vcpu *svc_elem;
> + unsigned long flags;
> + int sort_epoch;
> +
> + sort_epoch = csched_priv.runq_sort;
> + if ( sort_epoch == spc->runq_sort_last )
> + return;
> +
> + spc->runq_sort_last = sort_epoch;
> +
> + spin_lock_irqsave(&schedule_data[cpu].schedule_lock, flags);
> +
> + runq = &spc->runq;
> + elem = runq->next;
> + last_under = runq;
> +
> + while ( elem != runq )
> + {
> + next = elem->next;
> + svc_elem = __runq_elem(elem);
> +
> + if ( svc_elem->pri == CSCHED_PRI_TS_UNDER )
> + {
> + /* does elem need to move up the runq? */
> + if ( elem->prev != last_under )
> + {
> + list_del(elem);
> + list_add(elem, last_under);
> + }
> + last_under = elem;
> + }
> +
> + elem = next;
> + }
> +
> + spin_unlock_irqrestore(&schedule_data[cpu].schedule_lock, flags);
> +}
> +
> +static void
> +csched_acct(void)
> +{
> + unsigned long flags;
> + struct list_head *iter_vcpu, *next_vcpu;
> + struct list_head *iter_sdom, *next_sdom;
> + struct csched_vcpu *svc;
> + struct csched_dom *sdom;
> + uint32_t credit_total;
> + uint32_t weight_total;
> + uint32_t weight_left;
> + uint32_t credit_fair;
> + uint32_t credit_peak;
> + int credit_balance;
> + int credit_xtra;
> + int credit;
> +
> +
> + spin_lock_irqsave(&csched_priv.lock, flags);
> +
> + weight_total = csched_priv.weight;
> + credit_total = csched_priv.credit;
> +
> + /* Converge balance towards 0 when it drops negative */
> + if ( csched_priv.credit_balance < 0 )
> + {
> + credit_total -= csched_priv.credit_balance;
> + CSCHED_STAT_CRANK(acct_balance);
> + }
> +
> + if ( unlikely(weight_total == 0) )
> + {
> + csched_priv.credit_balance = 0;
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> + CSCHED_STAT_CRANK(acct_no_work);
> + return;
> + }
> +
> + CSCHED_STAT_CRANK(acct_run);
> +
> + weight_left = weight_total;
> + credit_balance = 0;
> + credit_xtra = 0;
> +
> + list_for_each_safe( iter_sdom, next_sdom, &csched_priv.active_sdom )
> + {
> + sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
> +
> + BUG_ON( is_idle_domain(sdom->dom) );
> + BUG_ON( sdom->active_vcpu_count == 0 );
> + BUG_ON( sdom->weight == 0 );
> + BUG_ON( sdom->weight > weight_left );
> +
> + weight_left -= sdom->weight;
> +
> + /*
> + * A domain's fair share is computed using its weight in competition
> + * with that of all other active domains.
> + *
> + * At most, a domain can use credits to run all its active VCPUs
> + * for one full accounting period. We allow a domain to earn more
> + * only when the system-wide credit balance is negative.
> + */
> + credit_peak = sdom->active_vcpu_count * CSCHED_ACCT_PERIOD;
> + if ( csched_priv.credit_balance < 0 )
> + {
> + credit_peak += ( ( -csched_priv.credit_balance * sdom->weight) +
> + (weight_total - 1)
> + ) / weight_total;
> + }
> + if ( sdom->cap != 0U )
> + {
> + uint32_t credit_cap = ((sdom->cap * CSCHED_ACCT_PERIOD) + 99) / 100;
> + if ( credit_cap < credit_peak )
> + credit_peak = credit_cap;
> + }
> +
> + credit_fair = ( ( credit_total * sdom->weight) + (weight_total - 1)
> + ) / weight_total;
> +
> + if ( credit_fair < credit_peak )
> + {
> + credit_xtra = 1;
> + }
> + else
> + {
> + if ( weight_left != 0U )
> + {
> + /* Give other domains a chance at unused credits */
> + credit_total += ( ( ( credit_fair - credit_peak
> + ) * weight_total
> + ) + ( weight_left - 1 )
> + ) / weight_left;
> + }
> +
> + if ( credit_xtra )
> + {
> + /*
> + * Lazily keep domains with extra credits at the head of
> + * the queue to give others a chance at them in future
> + * accounting periods.
> + */
> + CSCHED_STAT_CRANK(acct_reorder);
> + list_del(&sdom->active_sdom_elem);
> + list_add(&sdom->active_sdom_elem, &csched_priv.active_sdom);
> + }
> +
> + credit_fair = credit_peak;
> + }
> +
> + /* Compute fair share per VCPU */
> + credit_fair = ( credit_fair + ( sdom->active_vcpu_count - 1 )
> + ) / sdom->active_vcpu_count;
> +
> +
> + list_for_each_safe( iter_vcpu, next_vcpu, &sdom->active_vcpu )
> + {
> + svc = list_entry(iter_vcpu, struct csched_vcpu, active_vcpu_elem);
> + BUG_ON( sdom != svc->sdom );
> +
> + /* Increment credit */
> + atomic_add(credit_fair, &svc->credit);
> + credit = atomic_read(&svc->credit);
> +
> + /*
> + * Recompute priority or, if VCPU is idling, remove it from
> + * the active list.
> + */
> + if ( credit < 0 )
> + {
> + if ( sdom->cap == 0U )
> + svc->pri = CSCHED_PRI_TS_OVER;
> + else
> + svc->pri = CSCHED_PRI_TS_PARKED;
> +
> + if ( credit < -CSCHED_TSLICE )
> + {
> + CSCHED_STAT_CRANK(acct_min_credit);
> + credit = -CSCHED_TSLICE;
> + atomic_set(&svc->credit, credit);
> + }
> + }
> + else
> + {
> + svc->pri = CSCHED_PRI_TS_UNDER;
> +
> + if ( credit > CSCHED_TSLICE )
> + __csched_vcpu_acct_idle_locked(svc);
> + }
> +
> + svc->credit_last = credit;
> + svc->credit_incr = credit_fair;
> + credit_balance += credit;
> + }
> + }
> +
> + csched_priv.credit_balance = credit_balance;
> +
> + spin_unlock_irqrestore(&csched_priv.lock, flags);
> +
> + /* Inform each CPU that its runq needs to be sorted */
> + csched_priv.runq_sort++;
> +}
> +
> +static void
> +csched_tick(unsigned int cpu)
> +{
> + struct csched_vcpu * const svc = CSCHED_VCPU(current);
> + struct csched_dom * const sdom = svc->sdom;
> +
> + /*
> + * Accounting for running VCPU
> + *
> + * Note: Some VCPUs, such as the idle tasks, are not credit scheduled.
> + */
> + if ( likely(sdom != NULL) )
> + {
> + csched_vcpu_acct(svc, CSCHED_TICK);
> + }
> +
> + /*
> + * Accounting duty
> + *
> + * Note: Currently, this is always done by the master boot CPU. Eventually,
> + * we could distribute or at the very least cycle the duty.
> + */
> + if ( (csched_priv.master == cpu) &&
> + (schedule_data[cpu].tick % CSCHED_ACCT_NTICKS) == 0 )
> + {
> + csched_acct();
> + }
> +
> + /*
> + * Check if runq needs to be sorted
> + *
> + * Every physical CPU resorts the runq after the accounting master has
> + * modified priorities. This is a special O(n) sort and runs at most
> + * once per accounting period (currently 30 milliseconds).
> + */
> + csched_runq_sort(cpu);
> +}
> +
> +static struct csched_vcpu *
> +csched_runq_steal(struct csched_pcpu *spc, int cpu, int pri)
> +{
> + struct list_head *iter;
> + struct csched_vcpu *speer;
> + struct vcpu *vc;
> +
> + list_for_each( iter, &spc->runq )
> + {
> + speer = __runq_elem(iter);
> +
> + /*
> + * If next available VCPU here is not of higher priority than ours,
> + * this PCPU is useless to us.
> + */
> + if ( speer->pri <= CSCHED_PRI_IDLE || speer->pri <= pri )
> + {
> + CSCHED_STAT_CRANK(steal_peer_idle);
> + break;
> + }
> +
> + /* Is this VCPU is runnable on our PCPU? */
> + vc = speer->vcpu;
> + BUG_ON( is_idle_vcpu(vc) );
> +
> + if ( __csched_vcpu_is_stealable(cpu, vc) )
> + {
> + /* We got a candidate. Grab it! */
> + __runq_remove(speer);
> + vc->processor = cpu;
> +
> + return speer;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static struct csched_vcpu *
> +csched_load_balance(int cpu, struct csched_vcpu *snext)
> +{
> + struct csched_pcpu *spc;
> + struct csched_vcpu *speer;
> + int peer_cpu;
> +
> + if ( snext->pri == CSCHED_PRI_IDLE )
> + CSCHED_STAT_CRANK(load_balance_idle);
> + else if ( snext->pri == CSCHED_PRI_TS_OVER )
> + CSCHED_STAT_CRANK(load_balance_over);
> + else
> + CSCHED_STAT_CRANK(load_balance_other);
> +
> + peer_cpu = cpu;
> + BUG_ON( peer_cpu != snext->vcpu->processor );
> +
> + while ( 1 )
> + {
> + /* For each PCPU in the system starting with our neighbour... */
> + peer_cpu = (peer_cpu + 1) % csched_priv.ncpus;
> + if ( peer_cpu == cpu )
> + break;
> +
> + BUG_ON( peer_cpu >= csched_priv.ncpus );
> + BUG_ON( peer_cpu == cpu );
> +
> + /*
> + * Get ahold of the scheduler lock for this peer CPU.
> + *
> + * Note: We don't spin on this lock but simply try it. Spinning could
> + * cause a deadlock if the peer CPU is also load balancing and trying
> + * to lock this CPU.
> + */
> + if ( spin_trylock(&schedule_data[peer_cpu].schedule_lock) )
> + {
> +
> + spc = CSCHED_PCPU(peer_cpu);
> + if ( unlikely(spc == NULL) )
> + {
> + CSCHED_STAT_CRANK(steal_peer_down);
> + speer = NULL;
> + }
> + else
> + {
> + speer = csched_runq_steal(spc, cpu, snext->pri);
> + }
> +
> + spin_unlock(&schedule_data[peer_cpu].schedule_lock);
> +
> + /* Got one! */
> + if ( speer )
> + {
> + CSCHED_STAT_CRANK(vcpu_migrate);
> + return speer;
> + }
> + }
> + else
> + {
> + CSCHED_STAT_CRANK(steal_trylock_failed);
> + }
> + }
> +
> +
> + /* Failed to find more important work */
> + __runq_remove(snext);
> + return snext;
> +}
> +
> +/*
> + * This function is in the critical path. It is designed to be simple and
> + * fast for the common case.
> + */
> +static struct task_slice
> +csched_schedule(s_time_t now)
> +{
> + const int cpu = smp_processor_id();
> + struct list_head * const runq = RUNQ(cpu);
> + struct csched_vcpu * const scurr = CSCHED_VCPU(current);
> + struct csched_vcpu *snext;
> + struct task_slice ret;
> +
> + CSCHED_STAT_CRANK(schedule);
> + CSCHED_VCPU_CHECK(current);
> +
> + /*
> + * Select next runnable local VCPU (ie top of local runq)
> + */
> + if ( vcpu_runnable(current) )
> + __runq_insert(cpu, scurr);
> + else
> + BUG_ON( is_idle_vcpu(current) || list_empty(runq) );
> +
> + snext = __runq_elem(runq->next);
> +
> + /*
> + * SMP Load balance:
> + *
> + * If the next highest priority local runnable VCPU has already eaten
> + * through its credits, look on other PCPUs to see if we have more
> + * urgent work... If not, csched_load_balance() will return snext, but
> + * already removed from the runq.
> + */
> + if ( snext->pri > CSCHED_PRI_TS_OVER )
> + __runq_remove(snext);
> + else
> + snext = csched_load_balance(cpu, snext);
> +
> + /*
> + * Update idlers mask if necessary. When we're idling, other CPUs
> + * will tickle us when they get extra work.
> + */
> + if ( snext->pri == CSCHED_PRI_IDLE )
> + {
> + if ( !cpu_isset(cpu, csched_priv.idlers) )
> + cpu_set(cpu, csched_priv.idlers);
> + }
> + else if ( cpu_isset(cpu, csched_priv.idlers) )
> + {
> + cpu_clear(cpu, csched_priv.idlers);
> + }
> +
> + /*
> + * Return task to run next...
> + */
> + ret.time = MILLISECS(CSCHED_TSLICE);
> + ret.task = snext->vcpu;
> +
> + CSCHED_VCPU_CHECK(ret.task);
> + BUG_ON( !vcpu_runnable(ret.task) );
> +
> + return ret;
> +}
> +
> +static void
> +csched_dump_vcpu(struct csched_vcpu *svc)
> +{
> + struct csched_dom * const sdom = svc->sdom;
> +
> + printk("[%i.%i] pri=%i cpu=%i",
> + svc->vcpu->domain->domain_id,
> + svc->vcpu->vcpu_id,
> + svc->pri,
> + svc->vcpu->processor);
> +
> + if ( sdom )
> + {
> + printk(" credit=%i (%d+%u) {a=%u i=%u w=%u}",
> + atomic_read(&svc->credit),
> + svc->credit_last,
> + svc->credit_incr,
> + svc->state_active,
> + svc->state_idle,
> + sdom->weight);
> + }
> +
> + printk("\n");
> +}
> +
> +static void
> +csched_dump_pcpu(int cpu)
> +{
> + struct list_head *runq, *iter;
> + struct csched_pcpu *spc;
> + struct csched_vcpu *svc;
> + int loop;
> +
> + spc = CSCHED_PCPU(cpu);
> + runq = &spc->runq;
> +
> + printk(" tick=%lu, sort=%d\n",
> + schedule_data[cpu].tick,
> + spc->runq_sort_last);
> +
> + /* current VCPU */
> + svc = CSCHED_VCPU(schedule_data[cpu].curr);
> + if ( svc )
> + {
> + printk("\trun: ");
> + csched_dump_vcpu(svc);
> + }
> +
> + loop = 0;
> + list_for_each( iter, runq )
> + {
> + svc = __runq_elem(iter);
> + if ( svc )
> + {
> + printk("\t%3d: ", ++loop);
> + csched_dump_vcpu(svc);
> + }
> + }
> +}
> +
> +static void
> +csched_dump(void)
> +{
> + struct list_head *iter_sdom, *iter_svc;
> + int loop;
> +
> + printk("info:\n"
> + "\tncpus = %u\n"
> + "\tmaster = %u\n"
> + "\tcredit = %u\n"
> + "\tcredit balance = %d\n"
> + "\tweight = %u\n"
> + "\trunq_sort = %u\n"
> + "\ttick = %dms\n"
> + "\ttslice = %dms\n"
> + "\taccounting period = %dms\n"
> + "\tdefault-weight = %d\n",
> + csched_priv.ncpus,
> + csched_priv.master,
> + csched_priv.credit,
> + csched_priv.credit_balance,
> + csched_priv.weight,
> + csched_priv.runq_sort,
> + CSCHED_TICK,
> + CSCHED_TSLICE,
> + CSCHED_ACCT_PERIOD,
> + CSCHED_DEFAULT_WEIGHT);
> +
> + printk("idlers: 0x%lx\n", csched_priv.idlers.bits[0]);
> +
> + CSCHED_STATS_PRINTK();
> +
> + printk("active vcpus:\n");
> + loop = 0;
> + list_for_each( iter_sdom, &csched_priv.active_sdom )
> + {
> + struct csched_dom *sdom;
> + sdom = list_entry(iter_sdom, struct csched_dom, active_sdom_elem);
> +
> + list_for_each( iter_svc, &sdom->active_vcpu )
> + {
> + struct csched_vcpu *svc;
> + svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
> +
> + printk("\t%3d: ", ++loop);
> + csched_dump_vcpu(svc);
> + }
> + }
> +}
> +
> +static void
> +csched_init(void)
> +{
> + spin_lock_init(&csched_priv.lock);
> + INIT_LIST_HEAD(&csched_priv.active_sdom);
> + csched_priv.ncpus = 0;
> + csched_priv.master = UINT_MAX;
> + cpus_clear(csched_priv.idlers);
> + csched_priv.weight = 0U;
> + csched_priv.credit = 0U;
> + csched_priv.credit_balance = 0;
> + csched_priv.runq_sort = 0U;
> + CSCHED_STATS_RESET();
> +}
> +
> +
> +struct scheduler sched_credit_def = {
> + .name = "SMP Credit Scheduler",
> + .opt_name = "credit",
> + .sched_id = SCHED_CREDIT,
> +
> + .alloc_task = csched_vcpu_alloc,
> + .add_task = csched_vcpu_add,
> + .sleep = csched_vcpu_sleep,
> + .wake = csched_vcpu_wake,
> + .set_affinity = csched_vcpu_set_affinity,
> +
> + .adjdom = csched_dom_cntl,
> + .free_task = csched_dom_free,
> +
> + .tick = csched_tick,
> + .do_schedule = csched_schedule,
> +
> + .dump_cpu_state = csched_dump_pcpu,
> + .dump_settings = csched_dump,
> + .init = csched_init,
> +};
>
> _______________________________________________
> Xen-changelog mailing list
> Xen-changelog@lists.xensource.com
> http://lists.xensource.com/xen-changelog
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing
2006-05-26 15:51 ` Anthony Liguori
@ 2006-05-26 20:13 ` Ewan Mellor
0 siblings, 0 replies; 4+ messages in thread
From: Ewan Mellor @ 2006-05-26 20:13 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel, Emmanuel Ackaouy
On Fri, May 26, 2006 at 10:51:55AM -0500, Anthony Liguori wrote:
> Just some random feedback.
>
> Xen patchbot-unstable wrote:
> >
> >+static PyObject *pyxc_csched_domain_set(XcObject *self,
> >+ PyObject *args,
> >+ PyObject *kwds)
> >+{
> >+ uint32_t domid;
> >+ uint16_t weight;
> >+ uint16_t cap;
> >+ static char *kwd_list[] = { "dom", "weight", "cap", NULL };
> >+ static char kwd_type[] = "I|HH";
> >+ struct csched_domain sdom;
> >+
> >+ weight = 0;
> >+ cap = (uint16_t)~0U;
> >+ if( !PyArg_ParseTupleAndKeywords(args, kwds, kwd_type, kwd_list,
> >+ &domid, &weight, &cap) )
> >+ return NULL;
> >+
> >+ sdom.weight = weight;
> >+ sdom.cap = cap;
> >+
> >+ if ( xc_csched_domain_set(self->xc_handle, domid, &sdom) != 0 )
> >+ return PyErr_SetFromErrno(xc_error);
> >+
> >+ Py_INCREF(zero);
> >+ return zero;
> >+}
> >
>
> It's always seemed odd that we return zero here instead of Py_RETURN_NONE.
Emmanuel will simply have followed the existing practice. I agree that
there's no sense in what's there at the moment -- feel free to patch all of
these.
> >
> >+ def domain_csched_get(self, domid):
> >+ """Get credit scheduler parameters for a domain.
> >+ """
> >+ dominfo = self.domain_lookup_by_name_or_id_nr(domid)
> >+ if not dominfo:
> >+ raise XendInvalidDomain(str(domid))
> >+ try:
> >+ return xc.csched_domain_get(dominfo.getDomid())
> >+ except Exception, ex:
> >+ raise XendError(str(ex))
> >+
> >+ def domain_csched_set(self, domid, weight, cap):
> >+ """Set credit scheduler parameters for a domain.
> >+ """
> >+ dominfo = self.domain_lookup_by_name_or_id_nr(domid)
> >+ if not dominfo:
> >+ raise XendInvalidDomain(str(domid))
> >+ try:
> >+ return xc.csched_domain_set(dominfo.getDomid(), weight, cap)
> >+ except Exception, ex:
> >+ raise XendError(str(ex))
> >+
> >
>
> Please don't catch Exception. The XML-RPC now properly propagates all
> exceptions so there's no need to rewrap things in XendError. Just let
> the normal exception propagate.
Again, feel free to patch these in the existing code as well as
Emmanuel's new code.
> >diff -r b6937b931419 -r e539abd27a0f tools/python/xen/xm/main.py
> >--- a/tools/python/xen/xm/main.py Fri May 26 09:44:29 2006 +0100
> >+++ b/tools/python/xen/xm/main.py Fri May 26 11:14:36 2006 +0100
> >@@ -99,6 +99,7 @@ sched_sedf_help = "sched-sedf [DOM] [OPT
> > specifies another way of setting a
> > domain's\n\
> > cpu period/slice."
> >
> >+csched_help = "csched Set or get credit
> >scheduler parameters"
> > block_attach_help = """block-attach <DomId> <BackDev> <FrontDev> <Mode>
> > [BackDomId] Create a new virtual block device"""
> > block_detach_help = """block-detach <DomId> <DevId> Destroy a
> > domain's virtual block device,
> >@@ -174,6 +175,7 @@ host_commands = [
> > ]
> >
> > scheduler_commands = [
> >+ "csched",
> > "sched-bvt",
> > "sched-bvt-ctxallow",
> > "sched-sedf",
> >@@ -735,6 +737,48 @@ def xm_sched_sedf(args):
> > else:
> > print_sedf(sedf_info)
>
> Seem to be breaking naming convention here. sched-csched may seem
> redundant but that's what you get for choosing a non-descriptive name
> for the scheduler in the first place ;-)
sched-credit seems appropriate.
Ewan.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-05-26 20:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1Fjbdr-0005cU-OD@xenbits.xensource.com>
2006-05-26 14:00 ` [Xen-changelog] New weighted fair-share CPU scheduler w/ automatic SMP load balancing Matt Ayres
2006-05-26 14:15 ` Keir Fraser
2006-05-26 15:51 ` Anthony Liguori
2006-05-26 20:13 ` Ewan Mellor
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.