* Introduce rt real-time scheduler for Xen
@ 2014-07-11 4:49 Meng Xu
2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu
` (5 more replies)
0 siblings, 6 replies; 47+ messages in thread
From: Meng Xu @ 2014-07-11 4:49 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap,
dario.faggioli, ian.jackson, xumengpanda, lichong659, dgolomb
This serie of patches adds rt real-time scheduler to Xen.
In summary, It supports:
1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler;
2) Assign/display each VCPU's parameters of each domain;
3) Supports CPU Pool
The design of this rt scheduler is as follows:
This rt scheduler follows the Preemptive Global Earliest Deadline First (GEDF) theory in real-time field.
Each VCPU can have a dedicated period and budget. While scheduled, a VCPU burns its budget. Each VCPU has its budget replenished at the beginning of each of its periods; Each VCPU discards its unused budget at the end of each of its periods. If a VCPU runs out of budget in a period, it has to wait until next period.
The mechanism of how to burn a VCPU's budget depends on the server mechanism implemented for each VCPU.
The mechanism of deciding the priority of VCPUs at each scheduling point is based on the Preemptive Global Earliest Deadline First scheduling scheme.
Server mechanism: a VCPU is implemented as a deferrable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.
Priority scheme: Global Earliest Deadline First (EDF).
At any scheduling point, the VCPU with earliest deadline has highest priority.
Queue scheme: A global runqueue for each CPU pool.
The runqueue holds all runnable VCPUs.
VCPUs in the runqueue are divided into two parts: with and without remaining budget.
At each part, VCPUs are sorted based on GEDF priority scheme.
Scheduling quanta: 1 ms; but accounting the budget is in microsecond.
-----------------------------------------------------------------------------------------------------------------------------
One scenario to show the functionality of this rt scheduler is as follows:
//list each vcpu's parameters of each domain in cpu pools using rt scheduler
#xl sched-rt
Cpupool Pool-0: sched=EDF
Name ID VCPU Period Budget
Domain-0 0 0 10 10
Domain-0 0 1 20 20
Domain-0 0 2 30 30
Domain-0 0 3 10 10
litmus1 1 0 10 4
litmus1 1 1 10 4
//set the parameters of the vcpu 1 of domain litmus1:
# xl sched-rt -d litmus1 -v 1 -p 20 -b 10
//domain litmus1's vcpu 1's parameters are changed, display each VCPU's parameters separately:
# xl sched-rt -d litmus1
Name ID VCPU Period Budget
litmus1 1 0 10 4
litmus1 1 1 20 10
// list cpupool information
xl cpupool-list
Name CPUs Sched Active Domain count
Pool-0 12 rt y 2
//create a cpupool test
#xl cpupool-cpu-remove Pool-0 11
#xl cpupool-cpu-remove Pool-0 10
#xl cpupool-create name=\"test\" sched=\"credit\"
#xl cpupool-cpu-add test 11
#xl cpupool-cpu-add test 10
#xl cpupool-list
Name CPUs Sched Active Domain count
Pool-0 10 rt y 2
test 2 credit y 0
//migrate litmus1 from cpupool Pool-0 to cpupool test.
#xl cpupool-migrate litmus1 test
//now litmus1 is in cpupool test
# xl sched-credit
Cpupool test: tslice=30ms ratelimit=1000us
Name ID Weight Cap
litmus1 1 256 0
-----------------------------------------------------------------------------------------------------------------------------
The differences between this new rt real-time scheduler and the sedf scheduler are as follows:
1) rt scheduler supports global EDF scheduling, while sedf only supports partitioned scheduling. With the support of vcpu mask, rt scheduler can also be used as partitioned scheduling by setting each VCPU’s cpumask to a specific cpu.
2) rt scheduler supports setting and getting each VCPU’s parameters of a domain. A domain can have multiple vcpus with different parameters, rt scheduler can let user get/set the parameters of each VCPU of a specific domain; (sedf scheduler does not support it now)
3) rt scheduler supports cpupool.
4) rt scheduler uses deferrable server to burn/replenish budget of a VCPU, while sedf uses constrant bandwidth server to burn/replenish budget of a VCPU. This is just two options of implementing a global EDF real-time scheduler and both options’ real-time performance have already been proved in academic.
(Briefly speaking, the functionality that the *SEDF* scheduler plans to implement and improve in the future release has already been supported in this rt scheduler.)
(Although it’s unnecessary to implement two server mechanisms, we can simply modify the two functions of burning and replenishing vcpus’ budget to incorporate the CBS server mechanism or other server mechanisms into this rt scheduler.)
-----------------------------------------------------------------------------------------------------------------------------
TODO:
1) Improve the code of getting/setting each VCPU’s parameters. [easy]
Right now, it create an array with LIBXL_XEN_LEGACY_MAX_VCPUS (i.e., 32) elements to bounce all VCPUs’ parameters of a domain between xen tool and xen to get all VCPUs’ parameters of a domain. It is unnecessary to have LIBXL_XEN_LEGACY_MAX_VCPUS elements for this array.
The current work is to first get the exact number of VCPUs of a domain and then create an array with that exact number of elements to bounce between xen tool and xen.
2) Provide microsecond time precision in xl interface instead of millisecond time precision. [easy]
Right now, rt scheduler let user to specify each VCPU’s parameters (period, budget) in millisecond (i.e., ms). In some real-time application, user may want to specify VCPUs’ parameters in microsecond (i.e., us). The next work is to let user specify VCPUs’ parameters in microsecond and count the time in microsecond (or nanosecond) in xen rt scheduler as well.
3) Add Xen trace into the rt scheduler. [easy]
We will add a few xentrace tracepoints, like TRC_CSCHED2_RUNQ_POS in credit2 scheduler, in rt scheduler, to debug via tracing.
4) Method of improving the performance of rt scheduler [future work]
VCPUs of the same domain may preempt each other based on the preemptive global EDF scheduling policy. This self-switch issue does not bring benefit to the domain but introduce more overhead. When this situation happens, we can simply promote the current running lower-priority VCPU’s priority and let it borrow budget from higher priority VCPUs to avoid such self-swtich issue.
Timeline of implementing the TODOs:
We plan to finish the TODO 1), 2) and 3) within 3-4 weeks (or earlier).
Because TODO 4) will make the scheduling policy not pure GEDF, (people who wants the real GEDF may not be happy with this.) we look forward to hearing people’s opinions.
-----------------------------------------------------------------------------------------------------------------------------
Special huge thanks to Dario Faggioli for his helpful and detailed comments on the preview version of this rt scheduler. :-)
Any comment, question, and concerns are more than welcome! :-)
Thank you very much!
Meng
[PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor
[PATCH RFC v1 2/4] xl for rt scheduler
[PATCH RFC v1 3/4] libxl for rt scheduler
[PATCH RFC v1 4/4] libxc for rt scheduler
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 47+ messages in thread* [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu @ 2014-07-11 4:49 ` Meng Xu 2014-07-11 14:27 ` Dario Faggioli 2014-07-11 14:37 ` Andrew Cooper 2014-07-11 4:49 ` [PATCH RFC v1 2/4] xl for rt scheduler Meng Xu ` (4 subsequent siblings) 5 siblings, 2 replies; 47+ messages in thread From: Meng Xu @ 2014-07-11 4:49 UTC (permalink / raw) To: xen-devel Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xumengpanda, Meng Xu, lichong659, dgolomb This is the core rt scheduler patch. It adds the new real time scheduler to the hypervisor, as the non-default scheduler. This scheduler follows the pre-emptive Global EDF theory in real-time field. Each VCPU can have a dedicated period and budget. While scheduled, a VCPU burns its budget. A VCPU has its budget replenished at the beginning of each of its periods; The VCPU discards its unused budget at the end of each of its periods. If a VCPU runs out of budget in a period, it has to wait until next period. The mechanism of how to burn a VCPU's budget depends on the server mechanism implemented for each VCPU. Server mechanism: a VCPU is implemented as a deferable server. When a VCPU has a task running on it, its budget is continuously burned; When a VCPU has no task but with budget left, its budget is preserved. Priority scheme: Preemptive Global Earliest Deadline First (gEDF). At any scheduling point, the VCPU with earliest deadline has highest priority. Queue scheme: A global runqueue for each CPU pool. The runqueue holds all runnable VCPUs. VCPUs in the runqueue are divided into two parts: with and without budget. At each part, VCPUs are sorted based on EDF priority scheme. Scheduling quanta: 1 ms; but accounting the budget is in microsecond. Note: cpumask and cpupool is supported. This is still in the development phase. Signed-off-by: Sisu Xi <xisisu@gmail.com> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> --- xen/common/Makefile | 1 + xen/common/sched_rt.c | 984 +++++++++++++++++++++++++++++++++++++++++++ xen/common/schedule.c | 1 + xen/include/public/domctl.h | 19 + xen/include/xen/sched-if.h | 2 +- 5 files changed, 1006 insertions(+), 1 deletion(-) create mode 100644 xen/common/sched_rt.c diff --git a/xen/common/Makefile b/xen/common/Makefile index 3683ae3..5a23aa4 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -26,6 +26,7 @@ obj-y += sched_credit.o obj-y += sched_credit2.o obj-y += sched_sedf.o obj-y += sched_arinc653.o +obj-y += sched_rt.o obj-y += schedule.o obj-y += shutdown.o obj-y += softirq.o diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c new file mode 100644 index 0000000..41543a2 --- /dev/null +++ b/xen/common/sched_rt.c @@ -0,0 +1,984 @@ +/****************************************************************************** + * Preemptive Global Earliest Deadline First (EDF) scheduler for Xen + * EDF scheduling is one of most popular real-time scheduling algorithm used in + * embedded field. + * + * by Sisu Xi, 2013, Washington University in Saint Louis + * and Meng Xu, 2014, University of Pennsylvania + * + * based on the code of credit Scheduler + */ + +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/sched.h> +#include <xen/domain.h> +#include <xen/delay.h> +#include <xen/event.h> +#include <xen/time.h> +#include <xen/perfc.h> +#include <xen/sched-if.h> +#include <xen/softirq.h> +#include <asm/atomic.h> +#include <xen/errno.h> +#include <xen/trace.h> +#include <xen/cpu.h> +#include <xen/keyhandler.h> +#include <xen/trace.h> +#include <xen/guest_access.h> + +/* + * TODO: + * + * Migration compensation and resist like credit2 to better use cache; + * Lock Holder Problem, using yield? + * Self switch problem: VCPUs of the same domain may preempt each other; + */ + +/* + * Design: + * + * This scheduler follows the Preemptive Global EDF theory in real-time field. + * Each VCPU can have a dedicated period and budget. + * While scheduled, a VCPU burns its budget. + * A VCPU has its budget replenished at the beginning of each of its periods; + * The VCPU discards its unused budget at the end of each of its periods. + * If a VCPU runs out of budget in a period, it has to wait until next period. + * The mechanism of how to burn a VCPU's budget depends on the server mechanism + * implemented for each VCPU. + * + * Server mechanism: a VCPU is implemented as a deferable server. + * When a VCPU has a task running on it, its budget is continuously burned; + * When a VCPU has no task but with budget left, its budget is preserved. + * + * Priority scheme: Preemptive Global Earliest Deadline First (gEDF). + * At any scheduling point, the VCPU with earliest deadline has highest priority. + * + * Queue scheme: A global runqueue for each CPU pool. + * The runqueue holds all runnable VCPUs. + * VCPUs in the runqueue are divided into two parts: with and without remaining budget. + * At each part, VCPUs are sorted based on EDF priority scheme. + * + * Scheduling quanta: 1 ms; but accounting the budget is in microsecond. + * + * Note: cpumask and cpupool is supported. + */ + +/* + * Locking: + * Just like credit2, a global system lock is used to protect the RunQ. + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus. + * + * The lock is already grabbed when calling wake/sleep/schedule/ functions in schedule.c + * + * The functions involes RunQ and needs to grab locks are: + * dump, vcpu_insert, vcpu_remove, context_saved, + */ + + +/* + * Default parameters in ms + */ +#define RT_DEFAULT_PERIOD 10 +#define RT_DEFAULT_BUDGET 4 + +/* + * Useful macros + */ +#define RT_PRIV(_ops) ((struct rt_private *)((_ops)->sched_data)) +#define RT_VCPU(_vcpu) ((struct rt_vcpu *)(_vcpu)->sched_priv) +#define RT_DOM(_dom) ((struct rt_dom *)(_dom)->sched_priv) +#define RUNQ(_ops) (&RT_PRIV(_ops)->runq) + +/* + * Flags + */ +#define __RT_scheduled 1 +#define RT_scheduled (1<<__RT_scheduled) +#define __RT_delayed_runq_add 2 +#define RT_delayed_runq_add (1<<__RT_delayed_runq_add) + +/* + * Used to printout debug information + */ +#define printtime() ( printk("%d : %3ld.%3ld : %-19s ", smp_processor_id(), NOW()/MILLISECS(1), NOW()%MILLISECS(1)/1000, __func__) ) + +/* + * Systme-wide private data, include a global RunQueue + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus. + * It can be grabbed via vcpu_schedule_lock_irq() + */ +struct rt_private { + spinlock_t lock; /* The global coarse grand lock */ + struct list_head sdom; /* list of availalbe domains, used for dump */ + struct list_head runq; /* Ordered list of runnable VMs */ + cpumask_t cpus; /* cpumask_t of available physical cpus */ + cpumask_t tickled; /* another cpu in the queue already ticked this one */ +}; + +/* + * Virtual CPU + */ +struct rt_vcpu { + struct list_head runq_elem; /* On the runqueue list */ + struct list_head sdom_elem; /* On the domain VCPU list */ + + /* Up-pointers */ + struct rt_dom *sdom; + struct vcpu *vcpu; + + /* VCPU parameters, in milliseconds */ + s_time_t period; + s_time_t budget; + + /* VCPU current infomation */ + long cur_budget; /* current budget in microseconds */ + s_time_t last_start; /* last start time, used to calculate budget */ + s_time_t cur_deadline; /* current deadline, used to do EDF */ + unsigned flags; /* mark __RT_scheduled, etc.. */ +}; + +/* + * Domain + */ +struct rt_dom { + struct list_head vcpu; /* link its VCPUs */ + struct list_head sdom_elem; /* link list on rt_priv */ + struct domain *dom; /* pointer to upper domain */ +}; + +/* + * RunQueue helper functions + */ +static int +__vcpu_on_runq(struct rt_vcpu *svc) +{ + return !list_empty(&svc->runq_elem); +} + +static struct rt_vcpu * +__runq_elem(struct list_head *elem) +{ + return list_entry(elem, struct rt_vcpu, runq_elem); +} + +static inline void +__runq_remove(struct rt_vcpu *svc) +{ + if ( __vcpu_on_runq(svc) ) + list_del_init(&svc->runq_elem); +} + +/* + * Insert a vcpu in the RunQ basing on vcpu's deadline: + * vcpus with shorter deadline are inserted first. + * EDF schedule policy: vcpu with smaller deadline has higher priority; + * When vcpus have the same deadline, insert the current one at the head of these vcpus. + */ +static void +__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) +{ + struct list_head *runq = RUNQ(ops); + struct list_head *iter; + + ASSERT( spin_is_locked(per_cpu(schedule_data, svc->vcpu->processor).schedule_lock) ); + + if ( __vcpu_on_runq(svc) ) + return; + + list_for_each(iter, runq) { + struct rt_vcpu * iter_svc = __runq_elem(iter); + + if ( svc->cur_budget > 0 ) { /* svc still has budget */ + if ( iter_svc->cur_budget == 0 || + svc->cur_deadline <= iter_svc->cur_deadline ) + break; + } else { /* svc has no budget */ + if ( iter_svc->cur_budget == 0 && + svc->cur_deadline <= iter_svc->cur_deadline ) + break; + } + } + + list_add_tail(&svc->runq_elem, iter); +} + + +/* + * Debug related code, dump vcpu/cpu information + */ +static void +rt_dump_vcpu(struct rt_vcpu *svc) +{ + if ( svc == NULL ) { + printk("NULL!\n"); + return; + } +#define cpustr keyhandler_scratch + cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity); + printk("[%5d.%-2d] cpu %d, (%"PRId64", %"PRId64"), cur_b=%"PRId64" cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s ", + svc->vcpu->domain->domain_id, + svc->vcpu->vcpu_id, + svc->vcpu->processor, + svc->period, + svc->budget, + svc->cur_budget, + svc->cur_deadline, + svc->last_start, + __vcpu_on_runq(svc), + vcpu_runnable(svc->vcpu), + cpustr); + memset(cpustr, 0, sizeof(char)*1024); + cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool)); + printk("cpupool=%s\n", cpustr); +#undef cpustr +} + +static void +rt_dump_pcpu(const struct scheduler *ops, int cpu) +{ + struct rt_vcpu *svc = RT_VCPU(curr_on_cpu(cpu)); + + printtime(); + rt_dump_vcpu(svc); +} + +/* + * should not need lock here. only showing stuff + */ +static void +rt_dump(const struct scheduler *ops) +{ + struct list_head *iter_sdom, *iter_svc, *runq, *iter; + struct rt_private *prv = RT_PRIV(ops); + struct rt_vcpu *svc; + int cpu = 0; + int loop = 0; + + printtime(); + printk("Priority Scheme: EDF\n"); + + printk("PCPU info: \n"); + for_each_cpu(cpu, &prv->cpus) { + rt_dump_pcpu(ops, cpu); + } + + printk("Global RunQueue info: \n"); + loop = 0; + runq = RUNQ(ops); + list_for_each( iter, runq ) { + svc = __runq_elem(iter); + printk("\t%3d: ", ++loop); + rt_dump_vcpu(svc); + } + + printk("Domain info: \n"); + loop = 0; + list_for_each( iter_sdom, &prv->sdom ) { + struct rt_dom *sdom; + sdom = list_entry(iter_sdom, struct rt_dom, sdom_elem); + printk("\tdomain: %d\n", sdom->dom->domain_id); + + list_for_each( iter_svc, &sdom->vcpu ) { + svc = list_entry(iter_svc, struct rt_vcpu, sdom_elem); + printk("\t\t%3d: ", ++loop); + rt_dump_vcpu(svc); + } + } + + printk("\n"); +} + +/* + * Init/Free related code + */ +static int +rt_init(struct scheduler *ops) +{ + struct rt_private *prv; + + prv = xzalloc(struct rt_private); + if ( prv == NULL ) { + printk("xzalloc failed at rt_private\n"); + return -ENOMEM; + } + + ops->sched_data = prv; + spin_lock_init(&prv->lock); + INIT_LIST_HEAD(&prv->sdom); + INIT_LIST_HEAD(&prv->runq); + cpumask_clear(&prv->tickled); + cpumask_clear(&prv->cpus); + + printtime(); + printk("\n"); + + return 0; +} + +static void +rt_deinit(const struct scheduler *ops) +{ + struct rt_private *prv; + + printtime(); + printk("\n"); + + prv = RT_PRIV(ops); + if ( prv ) + xfree(prv); +} + +/* + * point per_cpu spinlock to the global system lock; all cpu have same global system lock + */ +static void * +rt_alloc_pdata(const struct scheduler *ops, int cpu) +{ + struct rt_private *prv = RT_PRIV(ops); + + cpumask_set_cpu(cpu, &prv->cpus); + + per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; + + printtime(); + printk("%s total cpus: %d", __FUNCTION__, cpumask_weight(&prv->cpus)); + return (void *)1; +} + +static void +rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) +{ + struct rt_private * prv = RT_PRIV(ops); + cpumask_clear_cpu(cpu, &prv->cpus); + printtime(); + printk("%s cpu=%d\n", __FUNCTION__, cpu); +} + +static void * +rt_alloc_domdata(const struct scheduler *ops, struct domain *dom) +{ + unsigned long flags; + struct rt_dom *sdom; + struct rt_private * prv = RT_PRIV(ops); + + printtime(); + printk("dom=%d\n", dom->domain_id); + + sdom = xzalloc(struct rt_dom); + if ( sdom == NULL ) { + printk("%s, xzalloc failed\n", __func__); + return NULL; + } + + INIT_LIST_HEAD(&sdom->vcpu); + INIT_LIST_HEAD(&sdom->sdom_elem); + sdom->dom = dom; + + /* spinlock here to insert the dom */ + spin_lock_irqsave(&prv->lock, flags); + list_add_tail(&sdom->sdom_elem, &(prv->sdom)); + spin_unlock_irqrestore(&prv->lock, flags); + + return (void *)sdom; +} + +static void +rt_free_domdata(const struct scheduler *ops, void *data) +{ + unsigned long flags; + struct rt_dom *sdom = data; + struct rt_private * prv = RT_PRIV(ops); + + printtime(); + printk("dom=%d\n", sdom->dom->domain_id); + + spin_lock_irqsave(&prv->lock, flags); + list_del_init(&sdom->sdom_elem); + spin_unlock_irqrestore(&prv->lock, flags); + xfree(data); +} + +static int +rt_dom_init(const struct scheduler *ops, struct domain *dom) +{ + struct rt_dom *sdom; + + printtime(); + printk("dom=%d\n", dom->domain_id); + + /* IDLE Domain does not link on rt_private */ + if ( is_idle_domain(dom) ) { return 0; } + + sdom = rt_alloc_domdata(ops, dom); + if ( sdom == NULL ) { + printk("%s, failed\n", __func__); + return -ENOMEM; + } + dom->sched_priv = sdom; + + return 0; +} + +static void +rt_dom_destroy(const struct scheduler *ops, struct domain *dom) +{ + printtime(); + printk("dom=%d\n", dom->domain_id); + + rt_free_domdata(ops, RT_DOM(dom)); +} + +static void * +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) +{ + struct rt_vcpu *svc; + s_time_t now = NOW(); + long count; + + /* Allocate per-VCPU info */ + svc = xzalloc(struct rt_vcpu); + if ( svc == NULL ) { + printk("%s, xzalloc failed\n", __func__); + return NULL; + } + + INIT_LIST_HEAD(&svc->runq_elem); + INIT_LIST_HEAD(&svc->sdom_elem); + svc->flags = 0U; + svc->sdom = dd; + svc->vcpu = vc; + svc->last_start = 0; /* init last_start is 0 */ + + svc->period = RT_DEFAULT_PERIOD; + if ( !is_idle_vcpu(vc) && vc->domain->domain_id != 0 ) { + svc->budget = RT_DEFAULT_BUDGET; + } else { + svc->budget = RT_DEFAULT_PERIOD; /* give vcpus of dom0 100% utilization */ + } + + count = (now/MILLISECS(svc->period)) + 1; + /* sync all VCPU's start time to 0 */ + svc->cur_deadline += count*MILLISECS(svc->period); + + svc->cur_budget = svc->budget*1000; /* counting in microseconds level */ + /* Debug only: dump new vcpu's info */ + printtime(); + rt_dump_vcpu(svc); + + return svc; +} + +static void +rt_free_vdata(const struct scheduler *ops, void *priv) +{ + struct rt_vcpu *svc = priv; + + /* Debug only: dump freed vcpu's info */ + printtime(); + rt_dump_vcpu(svc); + xfree(svc); +} + +static void +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) +{ + struct rt_vcpu *svc = RT_VCPU(vc); + + /* Debug only: dump info of vcpu to insert */ + printtime(); + rt_dump_vcpu(svc); + + /* IDLE VCPU not allowed on RunQ */ + if ( is_idle_vcpu(vc) ) + return; + + list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu); /* add to dom vcpu list */ +} + +static void +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) +{ + struct rt_vcpu * const svc = RT_VCPU(vc); + struct rt_dom * const sdom = svc->sdom; + + printtime(); + rt_dump_vcpu(svc); + + BUG_ON( sdom == NULL ); + BUG_ON( __vcpu_on_runq(svc) ); + + if ( !is_idle_vcpu(vc) ) { + list_del_init(&svc->sdom_elem); + } +} + +/* + * Pick a valid CPU for the vcpu vc + * Valid CPU of a vcpu is intesection of vcpu's affinity and available cpus + */ +static int +rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc) +{ + cpumask_t cpus; + cpumask_t *online; + int cpu; + struct rt_private * prv = RT_PRIV(ops); + + online = cpupool_scheduler_cpumask(vc->domain->cpupool); + cpumask_and(&cpus, &prv->cpus, online); + cpumask_and(&cpus, &cpus, vc->cpu_hard_affinity); + + cpu = cpumask_test_cpu(vc->processor, &cpus) + ? vc->processor + : cpumask_cycle(vc->processor, &cpus); + ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) ); + + return cpu; +} + +/* + * Implemented as deferrable server. + * Different server mechanism has different implementation. + * burn budget at microsecond level. + */ +static void +burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) { + s_time_t delta; + unsigned int consume; + long count = 0; + + /* first time called for this svc, update last_start */ + if ( svc->last_start == 0 ) { + svc->last_start = now; + return; + } + + /* don't burn budget for idle VCPU */ + if ( is_idle_vcpu(svc->vcpu) ) { + return; + } + + /* don't burn budget for Domain-0, RT-Xen use only */ + if ( svc->sdom->dom->domain_id == 0 ) { + return; + } + + /* update deadline info */ + delta = now - svc->cur_deadline; + if ( delta >= 0 ) { + count = ( delta/MILLISECS(svc->period) ) + 1; + svc->cur_deadline += count * MILLISECS(svc->period); + svc->cur_budget = svc->budget * 1000; + return; + } + + delta = now - svc->last_start; + if ( delta < 0 ) { + printk("%s, delta = %ld for ", __func__, delta); + rt_dump_vcpu(svc); + svc->last_start = now; /* update last_start */ + svc->cur_budget = 0; + return; + } + + if ( svc->cur_budget == 0 ) return; + + /* burn at microseconds level */ + consume = ( delta/MICROSECS(1) ); + if ( delta%MICROSECS(1) > MICROSECS(1)/2 ) consume++; + + svc->cur_budget -= consume; + if ( svc->cur_budget < 0 ) svc->cur_budget = 0; +} + +/* + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL + * lock is grabbed before calling this function + */ +static struct rt_vcpu * +__runq_pick(const struct scheduler *ops, cpumask_t mask) +{ + struct list_head *runq = RUNQ(ops); + struct list_head *iter; + struct rt_vcpu *svc = NULL; + struct rt_vcpu *iter_svc = NULL; + cpumask_t cpu_common; + cpumask_t *online; + struct rt_private * prv = RT_PRIV(ops); + + list_for_each(iter, runq) { + iter_svc = __runq_elem(iter); + + /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */ + online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool); + cpumask_and(&cpu_common, online, &prv->cpus); + cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity); + cpumask_and(&cpu_common, &mask, &cpu_common); + if ( cpumask_empty(&cpu_common) ) + continue; + + if ( iter_svc->cur_budget <= 0 ) + continue; + + svc = iter_svc; + break; + } + + return svc; +} + +/* + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq + * lock is grabbed before calling this function + */ +static void +__repl_update(const struct scheduler *ops, s_time_t now) +{ + struct list_head *runq = RUNQ(ops); + struct list_head *iter; + struct list_head *tmp; + struct rt_vcpu *svc = NULL; + + s_time_t diff; + long count; + + list_for_each_safe(iter, tmp, runq) { + svc = __runq_elem(iter); + + diff = now - svc->cur_deadline; + if ( diff > 0 ) { + count = (diff/MILLISECS(svc->period)) + 1; + svc->cur_deadline += count * MILLISECS(svc->period); + svc->cur_budget = svc->budget * 1000; + __runq_remove(svc); + __runq_insert(ops, svc); + } + } +} + +/* + * schedule function for rt scheduler. + * The lock is already grabbed in schedule.c, no need to lock here + */ +static struct task_slice +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled) +{ + const int cpu = smp_processor_id(); + struct rt_private * prv = RT_PRIV(ops); + struct rt_vcpu * const scurr = RT_VCPU(current); + struct rt_vcpu * snext = NULL; + struct task_slice ret; + + /* clear ticked bit now that we've been scheduled */ + if ( cpumask_test_cpu(cpu, &prv->tickled) ) + cpumask_clear_cpu(cpu, &prv->tickled); + + /* burn_budget would return for IDLE VCPU */ + burn_budgets(ops, scurr, now); + + __repl_update(ops, now); + + if ( tasklet_work_scheduled ) { + snext = RT_VCPU(idle_vcpu[cpu]); + } else { + cpumask_t cur_cpu; + cpumask_clear(&cur_cpu); + cpumask_set_cpu(cpu, &cur_cpu); + snext = __runq_pick(ops, cur_cpu); + if ( snext == NULL ) + snext = RT_VCPU(idle_vcpu[cpu]); + + /* if scurr has higher priority and budget, still pick scurr */ + if ( !is_idle_vcpu(current) && + vcpu_runnable(current) && + scurr->cur_budget > 0 && + ( is_idle_vcpu(snext->vcpu) || + scurr->cur_deadline <= snext->cur_deadline ) ) { + snext = scurr; + } + } + + if ( snext != scurr && + !is_idle_vcpu(current) && + vcpu_runnable(current) ) { + set_bit(__RT_delayed_runq_add, &scurr->flags); + } + + snext->last_start = now; + ret.migrated = 0; + if ( !is_idle_vcpu(snext->vcpu) ) { + if ( snext != scurr ) { + __runq_remove(snext); + set_bit(__RT_scheduled, &snext->flags); + } + if ( snext->vcpu->processor != cpu ) { + snext->vcpu->processor = cpu; + ret.migrated = 1; + } + } + + ret.time = MILLISECS(1); + ret.task = snext->vcpu; + + return ret; +} + +/* + * Remove VCPU from RunQ + * The lock is already grabbed in schedule.c, no need to lock here + */ +static void +rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) +{ + struct rt_vcpu * const svc = RT_VCPU(vc); + + BUG_ON( is_idle_vcpu(vc) ); + + if ( curr_on_cpu(vc->processor) == vc ) { + cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); + return; + } + + if ( __vcpu_on_runq(svc) ) { + __runq_remove(svc); + } + + clear_bit(__RT_delayed_runq_add, &svc->flags); +} + +/* + * Pick a vcpu on a cpu to kick out to place the running candidate + * Called by wake() and context_saved() + * We have a running candidate here, the kick logic is: + * Among all the cpus that are within the cpu affinity + * 1) if the new->cpu is idle, kick it. This could benefit cache hit + * 2) if there are any idle vcpu, kick it. + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one + * if snext has higher priority, kick it. + * + * TODO: + * 1) what if these two vcpus belongs to the same domain? + * replace a vcpu belonging to the same domain introduces more overhead + * + * lock is grabbed before calling this function + */ +static void +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) +{ + struct rt_private * prv = RT_PRIV(ops); + struct rt_vcpu * scheduled = NULL; /* lowest priority scheduled */ + struct rt_vcpu * iter_svc; + struct vcpu * iter_vc; + int cpu = 0; + cpumask_t not_tickled; /* not tickled cpus */ + cpumask_t *online; + + if ( new == NULL || is_idle_vcpu(new->vcpu) ) return; + + online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool); + cpumask_and(¬_tickled, online, &prv->cpus); + cpumask_and(¬_tickled, ¬_tickled, new->vcpu->cpu_hard_affinity); + cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); + + /* 1) if new's previous cpu is idle, kick it for cache benefit */ + if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) ) { + cpumask_set_cpu(new->vcpu->processor, &prv->tickled); + cpu_raise_softirq(new->vcpu->processor, SCHEDULE_SOFTIRQ); + return; + } + + /* 2) if there are any idle pcpu, kick it */ + /* The same loop also find the one with lowest priority */ + for_each_cpu(cpu, ¬_tickled) { + iter_vc = curr_on_cpu(cpu); + if ( is_idle_vcpu(iter_vc) ) { + cpumask_set_cpu(cpu, &prv->tickled); + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); + return; + } + iter_svc = RT_VCPU(iter_vc); + if ( scheduled == NULL || + iter_svc->cur_deadline > scheduled->cur_deadline ) { + scheduled = iter_svc; + } + } + + /* 3) candicate has higher priority, kick out the lowest priority vcpu */ + if ( scheduled != NULL && new->cur_deadline < scheduled->cur_deadline ) { + cpumask_set_cpu(scheduled->vcpu->processor, &prv->tickled); + cpu_raise_softirq(scheduled->vcpu->processor, SCHEDULE_SOFTIRQ); + } + return; +} + +/* + * Should always wake up runnable vcpu, put it back to RunQ. + * Check priority to raise interrupt + * The lock is already grabbed in schedule.c, no need to lock here + * TODO: what if these two vcpus belongs to the same domain? + */ +static void +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) +{ + struct rt_vcpu * const svc = RT_VCPU(vc); + s_time_t diff; + s_time_t now = NOW(); + long count = 0; + struct rt_private * prv = RT_PRIV(ops); + struct rt_vcpu * snext = NULL; /* highest priority on RunQ */ + + BUG_ON( is_idle_vcpu(vc) ); + + if ( unlikely(curr_on_cpu(vc->processor) == vc) ) return; + + /* on RunQ, just update info is ok */ + if ( unlikely(__vcpu_on_runq(svc)) ) return; + + /* If context hasn't been saved yet, set flag so it will add later */ + if ( unlikely(test_bit(__RT_scheduled, &svc->flags)) ) { + set_bit(__RT_delayed_runq_add, &svc->flags); + return; + } + + /* update deadline info */ + diff = now - svc->cur_deadline; + if ( diff >= 0 ) { + count = ( diff/MILLISECS(svc->period) ) + 1; + svc->cur_deadline += count * MILLISECS(svc->period); + svc->cur_budget = svc->budget * 1000; + } + + __runq_insert(ops, svc); + __repl_update(ops, now); + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL valid cpus */ + runq_tickle(ops, snext); + + return; +} + +/* + * scurr has finished context switch, insert it back to the RunQ, + * and then pick the highest priority vcpu from runq to run + */ +static void +rt_context_saved(const struct scheduler *ops, struct vcpu *vc) +{ + struct rt_vcpu * svc = RT_VCPU(vc); + struct rt_vcpu * snext = NULL; + struct rt_private * prv = RT_PRIV(ops); + spinlock_t *lock; + + clear_bit(__RT_scheduled, &svc->flags); + if ( is_idle_vcpu(vc) ) return; + + lock = vcpu_schedule_lock_irq(vc); + if ( test_and_clear_bit(__RT_delayed_runq_add, &svc->flags) && + likely(vcpu_runnable(vc)) ) { + __runq_insert(ops, svc); + __repl_update(ops, NOW()); + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL cpus */ + runq_tickle(ops, snext); + } + vcpu_schedule_unlock_irq(lock, vc); +} + +/* + * set/get each vcpu info of each domain + */ +static int +rt_dom_cntl(const struct scheduler *ops, struct domain *d, struct xen_domctl_scheduler_op *op) +{ + xen_domctl_sched_rt_params_t local_sched; + struct rt_dom * const sdom = RT_DOM(d); + struct list_head *iter; + int vcpu_index = 0; + int rc = -EINVAL; + + switch ( op->cmd ) + { + case XEN_DOMCTL_SCHEDOP_getinfo: + /* for debug use, whenever adjust Dom0 parameter, do global dump */ + if ( d->domain_id == 0 ) { + rt_dump(ops); + } + + local_sched.num_vcpus = 0; + list_for_each( iter, &sdom->vcpu ) { + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); + + ASSERT(vcpu_index < XEN_LEGACY_MAX_VCPUS); + local_sched.vcpus[vcpu_index].budget = svc->budget; + local_sched.vcpus[vcpu_index].period = svc->period; + + vcpu_index++; + } + local_sched.num_vcpus = vcpu_index; + copy_to_guest(op->u.rt.schedule, &local_sched, 1); + rc = 0; + break; + case XEN_DOMCTL_SCHEDOP_putinfo: + copy_from_guest(&local_sched, op->u.rt.schedule, 1); + list_for_each( iter, &sdom->vcpu ) { + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); + + if ( local_sched.vcpu_index == svc->vcpu->vcpu_id ) { /* adjust per VCPU parameter */ + vcpu_index = local_sched.vcpu_index; + + if ( vcpu_index < 0 || vcpu_index > XEN_LEGACY_MAX_VCPUS) { + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d\n", + vcpu_index); + }else{ + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d, period=%"PRId64", budget=%"PRId64"\n", + vcpu_index, local_sched.vcpus[vcpu_index].period, + local_sched.vcpus[vcpu_index].budget); + } + + svc->period = local_sched.vcpus[vcpu_index].period; + svc->budget = local_sched.vcpus[vcpu_index].budget; + + break; + } + } + rc = 0; + break; + } + + return rc; +} + +static struct rt_private _rt_priv; + +const struct scheduler sched_rt_def = { + .name = "SMP RT Scheduler", + .opt_name = "rt", + .sched_id = XEN_SCHEDULER_RT, + .sched_data = &_rt_priv, + + .dump_cpu_state = rt_dump_pcpu, + .dump_settings = rt_dump, + .init = rt_init, + .deinit = rt_deinit, + .alloc_pdata = rt_alloc_pdata, + .free_pdata = rt_free_pdata, + .alloc_domdata = rt_alloc_domdata, + .free_domdata = rt_free_domdata, + .init_domain = rt_dom_init, + .destroy_domain = rt_dom_destroy, + .alloc_vdata = rt_alloc_vdata, + .free_vdata = rt_free_vdata, + .insert_vcpu = rt_vcpu_insert, + .remove_vcpu = rt_vcpu_remove, + + .adjust = rt_dom_cntl, + + .pick_cpu = rt_cpu_pick, + .do_schedule = rt_schedule, + .sleep = rt_vcpu_sleep, + .wake = rt_vcpu_wake, + .context_saved = rt_context_saved, + + .yield = NULL, + .migrate = NULL, +}; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index e9eb0bc..2d13966 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -68,6 +68,7 @@ static const struct scheduler *schedulers[] = { &sched_sedf_def, &sched_credit_def, &sched_credit2_def, + &sched_rt_def, &sched_arinc653_def, }; diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 5b11bbf..d1a8201 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -339,6 +339,20 @@ struct xen_domctl_max_vcpus { typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); +/* + * This structure is used to pass to rt scheduler from a + * privileged domain to Xen + */ +struct xen_domctl_sched_rt_params { + struct { + signed long period; /* s_time_t type */ + signed long budget; + } vcpus[XEN_LEGACY_MAX_VCPUS]; + uint16_t num_vcpus; + uint16_t vcpu_index; +}; +typedef struct xen_domctl_sched_rt_params xen_domctl_sched_rt_params_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_sched_rt_params_t); /* XEN_DOMCTL_scheduler_op */ /* Scheduler types. */ @@ -346,6 +360,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); #define XEN_SCHEDULER_CREDIT 5 #define XEN_SCHEDULER_CREDIT2 6 #define XEN_SCHEDULER_ARINC653 7 +#define XEN_SCHEDULER_RT 8 + /* Set or get info? */ #define XEN_DOMCTL_SCHEDOP_putinfo 0 #define XEN_DOMCTL_SCHEDOP_getinfo 1 @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { struct xen_domctl_sched_credit2 { uint16_t weight; } credit2; + struct xen_domctl_sched_rt{ + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) schedule; + } rt; } u; }; typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t; diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 4164dff..e452d32 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -169,7 +169,7 @@ extern const struct scheduler sched_sedf_def; extern const struct scheduler sched_credit_def; extern const struct scheduler sched_credit2_def; extern const struct scheduler sched_arinc653_def; - +extern const struct scheduler sched_rt_def; struct cpupool { -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu @ 2014-07-11 14:27 ` Dario Faggioli 2014-07-11 14:37 ` Andrew Cooper 1 sibling, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 14:27 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Jan Beulich, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 36357 bytes --] [Adding Jan] On ven, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > This is the core rt scheduler patch. It adds the new real time scheduler to > the hypervisor, as the non-default scheduler. > So, honestly, I think these two lines above can just go away... The subject can be improved a bit too, probably. I don't know... something like "scheduling: introduce XEN_SCHEDULER_RT" or "sched: implement XEN_SCHEDULER_RT scheduling policy", etc. > This scheduler follows the pre-emptive Global EDF theory in real-time field. > Each VCPU can have a dedicated period and budget. > While scheduled, a VCPU burns its budget. > A VCPU has its budget replenished at the beginning of each of its periods; > The VCPU discards its unused budget at the end of each of its periods. > If a VCPU runs out of budget in a period, it has to wait until next period. > The mechanism of how to burn a VCPU's budget depends on the server mechanism > implemented for each VCPU. > > Server mechanism: a VCPU is implemented as a deferable server. > When a VCPU has a task running on it, its budget is continuously > burned; > "When a VCPU is scheduled", or "When a VCPU executes on a PCPU". Let's avoid messing with _why_ a VCPU executes, i.e., what the VCPU is actually doing... That's guest's business, not ours! :-) > When a VCPU has no task but with budget left, its budget is > preserved. > Ditto. > Priority scheme: Preemptive Global Earliest Deadline First (gEDF). > At any scheduling point, the VCPU with earliest deadline has highest > priority. > > Queue scheme: A global runqueue for each CPU pool. ^Runqueue, it's a bit more clear, I think. > The runqueue holds all runnable VCPUs. > VCPUs in the runqueue are divided into two parts: with and without budget. > At each part, VCPUs are sorted based on EDF priority scheme. > > Scheduling quanta: 1 ms; but accounting the budget is in microsecond. > > Note: cpumask and cpupool is supported. > No need to say this in the changelog. Actually, it's probably useful to say it right now, given this is an RFC... Not so much when this won't be an RFC any longer. > This is still in the development phase. > Right, and threfore let's concentrate on the code... we'll refine the changelog version after version. :-) > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > new file mode 100644 > index 0000000..41543a2 > --- /dev/null > +++ b/xen/common/sched_rt.c > +/* > + * TODO: > + * > + * Migration compensation and resist like credit2 to better use cache; > + * Lock Holder Problem, using yield? > + * Self switch problem: VCPUs of the same domain may preempt each other; > + */ > + This last one is an interesting point, even for non real-time scheduling. I gave some thoughts at this a few times, but never had time to properly gather ideas and/or collect some numbers... if you have anything like that, feel free to share (in another thread, as it's, I think, more general). > +/* > + * Locking: > + * Just like credit2, a global system lock is used to protect the RunQ. > + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus. > + * Mmm... Actually, I think credit2 at least tries to do something different. Well, each RunQ does have its own lock, but there are more than 1 RunQ in credit2 (or, again, at least that's the design. There are a few bugs in the credit2 implementation, but that does not matter here!! :-P). No big deal, of course, just perhaps try to describe a bit better what's the intended locking scheme. :-) > + * The lock is already grabbed when calling wake/sleep/schedule/ functions in schedule.c > + * > + * The functions involes RunQ and needs to grab locks are: > + * dump, vcpu_insert, vcpu_remove, context_saved, > + */ > + > + > +/* > + * Default parameters in ms > + */ > +#define RT_DEFAULT_PERIOD 10 > +#define RT_DEFAULT_BUDGET 4 > + ms? us? Can you put a comment saying what time units these 10 and 4 are. > +/* > + * Insert a vcpu in the RunQ basing on vcpu's deadline: > + * vcpus with shorter deadline are inserted first. > + * EDF schedule policy: vcpu with smaller deadline has higher priority; > The two line above sound redundant. I'd ditch the latter. > + * When vcpus have the same deadline, insert the current one at the head of these vcpus. > + */ > Knowing how ties are broken is useful. However, I'm not sure what you mean with "insert the current one at the head of these vcpus". Can you rephrase it? > +static void > +__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + > + ASSERT( spin_is_locked(per_cpu(schedule_data, svc->vcpu->processor).schedule_lock) ); > + Coding style: this line looks very long. :-) > + if ( __vcpu_on_runq(svc) ) > + return; > + Can this happen? I mean the fact that you try to put a VCPU on a RunQ while it's already there. If yes, and if that's by design, ok, but that's the sort of things I usually want to keep under control, as they're frequently hiding bugs etc. Again, I'm not saying this is the case. What I'd do is, if it can happen, put a comment explaining under what circumstances that is the case. If it can't, then put down another ASSERT rather than just returning. > + list_for_each(iter, runq) { > + struct rt_vcpu * iter_svc = __runq_elem(iter); > + > + if ( svc->cur_budget > 0 ) { /* svc still has budget */ > Put the comment above (minor thing, though). > + if ( iter_svc->cur_budget == 0 || > + svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + } else { /* svc has no budget */ > + if ( iter_svc->cur_budget == 0 && > + svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + } > Bear with me, as I'm rally thinking out loud here. Have you though at, or perhaps explored the solution of using two lists --one for vcpus with budget the other for depleted ones-- instead of only one divided in two parts. Perhaps it's not a big deal, but looking at this code looks like inserting a fully depleted vcpu potentially involves pointlessly going through all the elements with budget>0, which also means holding the global RunQ spinlock longer than one would expect. Thoughts? Also, coding style: inside the hypervisor, we do: if ( bla ) { blabla; } else /* or else if (bleblu ) */ { bleah } (Of course, this needs fixing everywhere). > +/* > + * Init/Free related code > + */ > +static int > +rt_init(struct scheduler *ops) > +{ > + struct rt_private *prv; > + > + prv = xzalloc(struct rt_private); > + if ( prv == NULL ) { > + printk("xzalloc failed at rt_private\n"); > + return -ENOMEM; > + } > + > + ops->sched_data = prv; > + spin_lock_init(&prv->lock); > + INIT_LIST_HEAD(&prv->sdom); > + INIT_LIST_HEAD(&prv->runq); > + cpumask_clear(&prv->tickled); > + cpumask_clear(&prv->cpus); > + > + printtime(); > + printk("\n"); > + Perhaps printtime() can take a string argument and append it to the time it prints. Well, it probably does not matter much, as I don't think things like printtime() should make it to the final versions. I mean, it's useful now, but when upstream, debugging happens via tracing, so it hopefully will become useless. Actually, I think adding a few xentrace tracepoints would be a great plus, as it'll allow Xen devs to start investigating and debugging this the usual 'Xen way'. Look at entries like TRC_CSCHED2_RUNQ_POS in credit2, or similar ones in credit. > + return 0; > +} > +static void * > +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) > +{ > + struct rt_vcpu *svc; > + s_time_t now = NOW(); > + long count; > + > + /* Allocate per-VCPU info */ > + svc = xzalloc(struct rt_vcpu); > + if ( svc == NULL ) { > + printk("%s, xzalloc failed\n", __func__); > + return NULL; > + } > + > + INIT_LIST_HEAD(&svc->runq_elem); > + INIT_LIST_HEAD(&svc->sdom_elem); > + svc->flags = 0U; > + svc->sdom = dd; > + svc->vcpu = vc; > + svc->last_start = 0; /* init last_start is 0 */ > + > + svc->period = RT_DEFAULT_PERIOD; > + if ( !is_idle_vcpu(vc) && vc->domain->domain_id != 0 ) { > + svc->budget = RT_DEFAULT_BUDGET; > + } else { > + svc->budget = RT_DEFAULT_PERIOD; /* give vcpus of dom0 100% utilization */ > + } > + Oh, so idle VCPUs and Dom0's VCPUs get 100%? Well, for one, write this explicitly somewhere. Anyway, idle, not so much important, it's kind of obvious actually. Dom0, it's certainly worth a few words, both here and up in the 'Design' description. I've got to think at whether or not that's a sane default setting. For sure Dom0 is a very peculiar domain, but the beauty (well, one of the beauties :-D) of Xen is that Dom0 is actually just one domain, there may exist other "special" domains (driver/stub domains, etc)... So let's try not making Dom0 itself too special, ok? :-P > + count = (now/MILLISECS(svc->period)) + 1; > + /* sync all VCPU's start time to 0 */ > + svc->cur_deadline += count*MILLISECS(svc->period); > + > + svc->cur_budget = svc->budget*1000; /* counting in microseconds level */ > + /* Debug only: dump new vcpu's info */ > + printtime(); > + rt_dump_vcpu(svc); > + > + return svc; > +} > +static void > +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu *svc = RT_VCPU(vc); > + > + /* Debug only: dump info of vcpu to insert */ > + printtime(); > + rt_dump_vcpu(svc); > + > + /* IDLE VCPU not allowed on RunQ */ > + if ( is_idle_vcpu(vc) ) > + return; > + Mmm... Why are we speaking about the RunQ... this is not the function adding a VCPU to the RunQ, is it? > + list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu); /* add to dom vcpu list */ > +} > + > +static void > +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + struct rt_dom * const sdom = svc->sdom; > + > + printtime(); > + rt_dump_vcpu(svc); > + > + BUG_ON( sdom == NULL ); > + BUG_ON( __vcpu_on_runq(svc) ); > + > + if ( !is_idle_vcpu(vc) ) { > + list_del_init(&svc->sdom_elem); > + } > +} > + About these two functions (rtglobal_vcpu_insert and rtglobal_vcpu_remove), in addition to what I wrote yesterday. It looks to me that, when calling them, for instance, from sched_move_domain() in schedule.c, what you do is removing the VCPU from the list of VCPUs of its domain, and then add it back there. Sure the net effect is ok, but it does certainly not look like the best possible thing that should happen in such case. I can't be positive about what I'm about to say, but it looks to me that, given how the insert_vcpu hook is used, you may not want to define it (just don't put any .insert_vcpu=xxx nor .remove_vcpu=yyy there when initializing sched_rtglobal_def, at the bottom of the file). Of course you'd need to find another place from where to add the vcpu to the list of its own domain's VCPUs, but that should be easy. Or am I missing something? For sure, in the way it looks not, it is now, although probably functional (it does look so, I haven't tested this but I trust you did :-) ), it's rather inefficient and confusing to read. > +/* > + * Implemented as deferrable server. > + * Different server mechanism has different implementation. > Kill this line, until (if ever) we'll actually have more than one server mechanism. > + * burn budget at microsecond level. > + */ I.e., using microsecs everythere, including in the interface, would kill the need of doing these conversions inside the scheduler logic, which is, to me, is already a fair reason to go that way. I see you've got this in your TODO list already, so good. :-) > +static void > +burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) { > + s_time_t delta; > + unsigned int consume; > + long count = 0; > + > + /* first time called for this svc, update last_start */ > + if ( svc->last_start == 0 ) { > + svc->last_start = now; > + return; > + } > + > + /* don't burn budget for idle VCPU */ > + if ( is_idle_vcpu(svc->vcpu) ) { > + return; > + } > + This check on is_idle_vcpu() can be the very first thing you do, can't it? I mean, is it important that we update last_start for the idle VCPUs? If not, move it up. > + /* don't burn budget for Domain-0, RT-Xen use only */ > + if ( svc->sdom->dom->domain_id == 0 ) { > + return; > + } > + Oh, so Dom0 is not inside a (deferrable) server? Nope, that needs to change. It's probably ok to give it 100% bandwidth by default (still thinking about this), but it certainly needs to be possible for a sysadmin to change that, assign to Dom0's VCPUs budgets and periods, and have it scheduled like all the other domains. Is it going to be hard to make happen? I honestly don't think so. I think that just killing all of this (domain_id == 0) checks would pretty much do... > + /* update deadline info */ > Be a bit more explicit in the comment, in explaining what's going on here, i.e., deadline is in the past, so we need to compensate, etc. etc. Also, can you provide me a little insight on what could cause delta to be non-negative? Have you observed it? If yes, a lot? Under what circumstances? More important, is it ok to just silently fix things, or should we warn the user/sysadmin that we're lagging behind? > + delta = now - svc->cur_deadline; > + if ( delta >= 0 ) { > + count = ( delta/MILLISECS(svc->period) ) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > + return; > + } > + So, in case deadline was > 0, deadline was in the past. What about cur_budget? I mean, what about overruns? I know it's something that should not happen, but if it does, is forgetting about it the best option? However, I see budget is handled below, so I'll comment there. > + delta = now - svc->last_start; > + if ( delta < 0 ) { > + printk("%s, delta = %ld for ", __func__, delta); > + rt_dump_vcpu(svc); > + svc->last_start = now; /* update last_start */ > + svc->cur_budget = 0; > + return; > + } > + As said above, put down some words of comment on when this can happen and why you think this is the best recovery action, please. :-) > + if ( svc->cur_budget == 0 ) return; > + > + /* burn at microseconds level */ > + consume = ( delta/MICROSECS(1) ); > + if ( delta%MICROSECS(1) > MICROSECS(1)/2 ) consume++; > + ditto (about conversions). > + svc->cur_budget -= consume; > + if ( svc->cur_budget < 0 ) svc->cur_budget = 0; > What I've always done is having the task (in this case, that would be the vcpu) to pay back for that. So if it's max_budget was 100 and I at some point find out it managed to execute for 115, i.e., it overran by 15, I only replenish it to 100-15=85. I'm saying this here because, even if it's not here that you replenish, all the mechanisms I can think of to take care of overruns need the current budget to actually be negative, when replenishment happens, in order to be able to take overrun itself into account. So what do you think? Can overrun happen in this scheduler? If yes, how severely? Don't you think it's something important to consider? > +} > + > +/* > + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL > + * lock is grabbed before calling this function > + */ > +static struct rt_vcpu * > +__runq_pick(const struct scheduler *ops, cpumask_t mask) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + struct rt_vcpu *svc = NULL; > + struct rt_vcpu *iter_svc = NULL; > + cpumask_t cpu_common; > + cpumask_t *online; > + struct rt_private * prv = RT_PRIV(ops); > + > + list_for_each(iter, runq) { > + iter_svc = __runq_elem(iter); > + > + /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */ > What's in priv->cpus, BTW? How is it different form the cpupool online mask (returned in 'online' by cpupool_scheduler_cpumask() )? > + online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool); > + cpumask_and(&cpu_common, online, &prv->cpus); > + cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity); > + cpumask_and(&cpu_common, &mask, &cpu_common); > + if ( cpumask_empty(&cpu_common) ) > + continue; > + > + if ( iter_svc->cur_budget <= 0 ) > + continue; > + > + svc = iter_svc; > + break; > + } > + > + return svc; > +} > + > +/* > + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq > + * lock is grabbed before calling this function > + */ > +static void > +__repl_update(const struct scheduler *ops, s_time_t now) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + struct list_head *tmp; > + struct rt_vcpu *svc = NULL; > + > + s_time_t diff; > + long count; > + > + list_for_each_safe(iter, tmp, runq) { > + svc = __runq_elem(iter); > + Wow, so we scan all the RunQ... :-/ Let's check when this is called, but I personally really don't like this things. :-( Can't we use something like a timer for replenishments? > + diff = now - svc->cur_deadline; > + if ( diff > 0 ) { > + count = (diff/MILLISECS(svc->period)) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > Payback for possible overrun again? In any case, payback or not, this is the same code from inside burn_budget() above. An helper function is probably what we want. > + __runq_remove(svc); > + __runq_insert(ops, svc); > + } > + } > +} > + > +/* > + * schedule function for rt scheduler. > + * The lock is already grabbed in schedule.c, no need to lock here > + */ > +static struct task_slice > +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled) > +{ > + const int cpu = smp_processor_id(); > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * const scurr = RT_VCPU(current); > + struct rt_vcpu * snext = NULL; > + struct task_slice ret; > + struct task_slice ret = { .migrated = 0 }; should work, and should allow you to get rid of the ret.migrated=0 below. It's a minor thing, though, mostly a matter of taste, I think. > + /* clear ticked bit now that we've been scheduled */ > + if ( cpumask_test_cpu(cpu, &prv->tickled) ) > + cpumask_clear_cpu(cpu, &prv->tickled); > + > + /* burn_budget would return for IDLE VCPU */ > + burn_budgets(ops, scurr, now); > + > + __repl_update(ops, now); > + And here it is. So, at each tick (and possibly even more frequently) we scan the full RunQ, just to update the deadlines and perform replenishments. I confirm that I don't like this very much. burn_budget() is something that should indeed happen here. __repl_update(), OTOH, should somehow be made event based. I can't provide a deferrable server example out of the top of my head. IIRC, in my Linux CBS implementation I was setting a timer when a task is inserted in the runqueue, to fire at the task's absolute deadline time (well, after one period, if you want to allow deadline different than periods). At that point, you either find the task still running, with some leftover budget, you find it still running with no (or very few) budget, which means overrun (or something very close to overrun), or (most of the cases) you find it stopped, because it ran out of budget. That's when --in the timer handler-- you issue the replenishment and postpone the deadline accordingly (depending on which one of the 3 above the actual situation was). When re-enqueueing the task, with new budget and new deadline, I was also setting the timer again. Of course, when the task was removed from the runqueue because it blocked by itself (i.e., _not_ because he depleted its budget), I was stopping the timer too. I've also implemented the sporadic server on Linux, using the exact same mechanism, although it was a bit more complex, as the sporadic server handles replenishments in a more complicated way (it replenishes in chunks, so I had to set the timer for each chunk, one after the other). I'm sure something like this can be done here, in Xen, and with DS as an algorithm. It shouldn't even be too hard to just use either only one timer, or, perhaps, one timer per PCPU. I'm quite sure we don't want one timer per VCPU, but this is of course something we can discuss. For now, what do you think about the timer idea? > + if ( tasklet_work_scheduled ) { > + snext = RT_VCPU(idle_vcpu[cpu]); > + } else { > + cpumask_t cur_cpu; > + cpumask_clear(&cur_cpu); > + cpumask_set_cpu(cpu, &cur_cpu); > + snext = __runq_pick(ops, cur_cpu); > + if ( snext == NULL ) > + snext = RT_VCPU(idle_vcpu[cpu]); > + > + /* if scurr has higher priority and budget, still pick scurr */ > + if ( !is_idle_vcpu(current) && > + vcpu_runnable(current) && > + scurr->cur_budget > 0 && > + ( is_idle_vcpu(snext->vcpu) || > + scurr->cur_deadline <= snext->cur_deadline ) ) { > + snext = scurr; > + } > + } > + > + if ( snext != scurr && > + !is_idle_vcpu(current) && > + vcpu_runnable(current) ) { > + set_bit(__RT_delayed_runq_add, &scurr->flags); > + } > + > + snext->last_start = now; > + ret.migrated = 0; > + if ( !is_idle_vcpu(snext->vcpu) ) { > + if ( snext != scurr ) { > + __runq_remove(snext); > + set_bit(__RT_scheduled, &snext->flags); > + } > + if ( snext->vcpu->processor != cpu ) { > + snext->vcpu->processor = cpu; > + ret.migrated = 1; > + } > + } > + > + ret.time = MILLISECS(1); > + ret.task = snext->vcpu; > + > + return ret; > +} > + > +/* > + * Remove VCPU from RunQ > + * The lock is already grabbed in schedule.c, no need to lock here > + */ > +static void > +rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + > + BUG_ON( is_idle_vcpu(vc) ); > + > + if ( curr_on_cpu(vc->processor) == vc ) { > + cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); > + return; > + } > + > + if ( __vcpu_on_runq(svc) ) { > + __runq_remove(svc); > + } > + As said, can it actually happen that we try to put to sleep a vcpu that's not in the RunQ. If yes, fine, but I'd like a comment about why that's the case. If not, let's convert the if to an ASSERT. Let me state once more that I'm not questioning the correctness of your code, I just want to be sure that the final result is as straightforward to read and understand as possible, when it comes to study or debug it. Well, this, in my opinion, is one of the spots where having the said aid (comment or ASSERT) from the original implementors would help a lot with that goal. :-) > + clear_bit(__RT_delayed_runq_add, &svc->flags); > +} > + > +/* > + * Pick a vcpu on a cpu to kick out to place the running candidate > + * Called by wake() and context_saved() > + * We have a running candidate here, the kick logic is: > + * Among all the cpus that are within the cpu affinity > + * 1) if the new->cpu is idle, kick it. This could benefit cache hit > + * 2) if there are any idle vcpu, kick it. > + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one > + * if snext has higher priority, kick it. > + * > + * TODO: > + * 1) what if these two vcpus belongs to the same domain? > + * replace a vcpu belonging to the same domain introduces more overhead > + * > + * lock is grabbed before calling this function > + */ Now this is a really good looking doc comment!! :-D > +static void > +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) > +{ > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * scheduled = NULL; /* lowest priority scheduled */ > Can I suggest a better name for this variable? Maybe latest_deadline_vcpu, or latest_scheduled, or just latest... In general, I'd like it to reflect the concept that it points to the VCPU with the latest deadline (i.e., the lowest prio, but as repeatedly said, we're dealing with EDF only!). > + struct rt_vcpu * iter_svc; > + struct vcpu * iter_vc; > + int cpu = 0; > + cpumask_t not_tickled; /* not tickled cpus */ > Comment not necessary, the name is talking enough by itself in this case. One issue here is that we really are tying hard to avoid putting cpumask_t on the stack, as they can be big on large boxes, with lots of cpus. However, let's consider this low prio for now. > + cpumask_t *online; > + > + if ( new == NULL || is_idle_vcpu(new->vcpu) ) return; > + > + online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool); > + cpumask_and(¬_tickled, online, &prv->cpus); > + cpumask_and(¬_tickled, ¬_tickled, new->vcpu->cpu_hard_affinity); > + cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); > + > + /* 1) if new's previous cpu is idle, kick it for cache benefit */ > + if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) ) { > Remember: '{' goes on new line while inside the hypervisor. > + cpumask_set_cpu(new->vcpu->processor, &prv->tickled); > + cpu_raise_softirq(new->vcpu->processor, SCHEDULE_SOFTIRQ); > + return; > + } > + > + /* 2) if there are any idle pcpu, kick it */ > + /* The same loop also find the one with lowest priority */ > + for_each_cpu(cpu, ¬_tickled) { > + iter_vc = curr_on_cpu(cpu); > + if ( is_idle_vcpu(iter_vc) ) { > + cpumask_set_cpu(cpu, &prv->tickled); > + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > + return; > + } > + iter_svc = RT_VCPU(iter_vc); > + if ( scheduled == NULL || > + iter_svc->cur_deadline > scheduled->cur_deadline ) { > + scheduled = iter_svc; > + } > + } > + > + /* 3) candicate has higher priority, kick out the lowest priority vcpu */ > + if ( scheduled != NULL && new->cur_deadline < scheduled->cur_deadline ) { > + cpumask_set_cpu(scheduled->vcpu->processor, &prv->tickled); > + cpu_raise_softirq(scheduled->vcpu->processor, SCHEDULE_SOFTIRQ); > + } > + return; > +} > + > +/* > + * Should always wake up runnable vcpu, put it back to RunQ. > + * Check priority to raise interrupt > + * The lock is already grabbed in schedule.c, no need to lock here > + * TODO: what if these two vcpus belongs to the same domain? > + */ > +static void > +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + s_time_t diff; > + s_time_t now = NOW(); > + long count = 0; > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * snext = NULL; /* highest priority on RunQ */ > + > + BUG_ON( is_idle_vcpu(vc) ); > + > + if ( unlikely(curr_on_cpu(vc->processor) == vc) ) return; > + ASSERT or comment. Also, here and everywhere else where this happens, 'return' (in general the then branch of the if) goes on new line, even if it's a single statement. > + /* on RunQ, just update info is ok */ > + if ( unlikely(__vcpu_on_runq(svc)) ) return; > + This happens to have a comment, but not as good as the case requires. :-P > + /* If context hasn't been saved yet, set flag so it will add later */ > + if ( unlikely(test_bit(__RT_scheduled, &svc->flags)) ) { > + set_bit(__RT_delayed_runq_add, &svc->flags); > + return; > + } > + Credit2 implementation has proper comments on what this flag (and the other one too, actually) means and how it's used, both when the flags are defined and used. Either you provide similar comments in here (taking them from sched_credit2.c is ok), or you somehow reference those in sched_credit2.c. Personally, I don't like referencing other comments from other file, so I'd go for the cut-&-past-ish approach. > + /* update deadline info */ > + diff = now - svc->cur_deadline; > + if ( diff >= 0 ) { > + count = ( diff/MILLISECS(svc->period) ) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > + } > + And here it comes the same code again. Definitely, make an helper function and call it from all the 3 spots. > + __runq_insert(ops, svc); > + __repl_update(ops, now); > So, every time a VCPU wakes up, we scan the full RunQ,, which also means grabbing and holding the global RunQ lock, to check whether any task needs a replenishment... Why is this required? Once more, can't we use timers for replenishments, and get rid of this? I'm sure it comes at a rather high cost, doesn't it? > + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL valid cpus */ > + runq_tickle(ops, snext); > + > + return; > +} > + > +/* > + * scurr has finished context switch, insert it back to the RunQ, > + * and then pick the highest priority vcpu from runq to run > + */ > +static void > +rt_context_saved(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * svc = RT_VCPU(vc); > + struct rt_vcpu * snext = NULL; > + struct rt_private * prv = RT_PRIV(ops); > + spinlock_t *lock; > + > + clear_bit(__RT_scheduled, &svc->flags); > You clear this flag outside of the critical section on lock... This is different from what happens in credit2, and it does not sound safe to me. Is it intentional? > + if ( is_idle_vcpu(vc) ) return; > + > + lock = vcpu_schedule_lock_irq(vc); > + if ( test_and_clear_bit(__RT_delayed_runq_add, &svc->flags) && > + likely(vcpu_runnable(vc)) ) { > + __runq_insert(ops, svc); > + __repl_update(ops, NOW()); ditto. > + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL cpus */ > + runq_tickle(ops, snext); > I also think that, if we use something like timers for replenishments, we can avoid calling __runq_pick() here too, as we won't be re-sorting the queue, so it'd be enough to runq_tickle() for svc... I think. :-) > + } > + vcpu_schedule_unlock_irq(lock, vc); > +} > + > +/* > + * set/get each vcpu info of each domain > + */ > +static int > +rt_dom_cntl(const struct scheduler *ops, struct domain *d, struct xen_domctl_scheduler_op *op) > Lots of long lines in this function... > +{ > + xen_domctl_sched_rt_params_t local_sched; > + struct rt_dom * const sdom = RT_DOM(d); > + struct list_head *iter; > + int vcpu_index = 0; > + int rc = -EINVAL; > + > + switch ( op->cmd ) > + { > + case XEN_DOMCTL_SCHEDOP_getinfo: > + /* for debug use, whenever adjust Dom0 parameter, do global dump */ > + if ( d->domain_id == 0 ) { > + rt_dump(ops); > + } > + > + local_sched.num_vcpus = 0; > + list_for_each( iter, &sdom->vcpu ) { > + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); > + > + ASSERT(vcpu_index < XEN_LEGACY_MAX_VCPUS); > + local_sched.vcpus[vcpu_index].budget = svc->budget; > + local_sched.vcpus[vcpu_index].period = svc->period; > + > + vcpu_index++; > + } > + local_sched.num_vcpus = vcpu_index; > + copy_to_guest(op->u.rt.schedule, &local_sched, 1); > + rc = 0; > + break; > + case XEN_DOMCTL_SCHEDOP_putinfo: > + copy_from_guest(&local_sched, op->u.rt.schedule, 1); > + list_for_each( iter, &sdom->vcpu ) { > + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); > + > + if ( local_sched.vcpu_index == svc->vcpu->vcpu_id ) { /* adjust per VCPU parameter */ > + vcpu_index = local_sched.vcpu_index; > + > + if ( vcpu_index < 0 || vcpu_index > XEN_LEGACY_MAX_VCPUS) { > + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d\n", > + vcpu_index); > + }else{ > + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d, period=%"PRId64", budget=%"PRId64"\n", > + vcpu_index, local_sched.vcpus[vcpu_index].period, > + local_sched.vcpus[vcpu_index].budget); > + } > + > + svc->period = local_sched.vcpus[vcpu_index].period; > + svc->budget = local_sched.vcpus[vcpu_index].budget; > + > + break; > + } > + } > + rc = 0; > + break; > + } > + > + return rc; > +} > + > +static struct rt_private _rt_priv; > + > +const struct scheduler sched_rt_def = { > + .name = "SMP RT Scheduler", > + .opt_name = "rt", > + .sched_id = XEN_SCHEDULER_RT, > + .sched_data = &_rt_priv, > + > + .dump_cpu_state = rt_dump_pcpu, > + .dump_settings = rt_dump, > + .init = rt_init, > + .deinit = rt_deinit, > + .alloc_pdata = rt_alloc_pdata, > + .free_pdata = rt_free_pdata, > + .alloc_domdata = rt_alloc_domdata, > + .free_domdata = rt_free_domdata, > + .init_domain = rt_dom_init, > + .destroy_domain = rt_dom_destroy, > + .alloc_vdata = rt_alloc_vdata, > + .free_vdata = rt_free_vdata, > + .insert_vcpu = rt_vcpu_insert, > + .remove_vcpu = rt_vcpu_remove, > + > + .adjust = rt_dom_cntl, > + > + .pick_cpu = rt_cpu_pick, > + .do_schedule = rt_schedule, > + .sleep = rt_vcpu_sleep, > + .wake = rt_vcpu_wake, > + .context_saved = rt_context_saved, > + > + .yield = NULL, > + .migrate = NULL, > You can skip the hunks that you want to be NULL. If you don't, because you think it's worthwhile to give a more visual and explicit info about a particular hunk being not implemented, put down a comment on why that is the case. It think, in this case, it's good to show the reviewers and future readers/hackers that we don't have yield and migrate here, so ack on the "=NULL", but do add the comment on what that is. > --- a/xen/include/xen/sched-if.h > +++ b/xen/include/xen/sched-if.h > @@ -169,7 +169,7 @@ extern const struct scheduler sched_sedf_def; > extern const struct scheduler sched_credit_def; > extern const struct scheduler sched_credit2_def; > extern const struct scheduler sched_arinc653_def; > - > +extern const struct scheduler sched_rt_def; > Don't kill blank lines, just add your stuff. Phew... that's it for this patch... let's move to (hopefully) simpler ones! :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu 2014-07-11 14:27 ` Dario Faggioli @ 2014-07-11 14:37 ` Andrew Cooper 2014-07-11 15:21 ` Dario Faggioli 1 sibling, 1 reply; 47+ messages in thread From: Andrew Cooper @ 2014-07-11 14:37 UTC (permalink / raw) To: Meng Xu, xen-devel Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xumengpanda, lichong659, dgolomb On 11/07/14 05:49, Meng Xu wrote: > This is the core rt scheduler patch. It adds the new real time scheduler to > the hypervisor, as the non-default scheduler. > > This scheduler follows the pre-emptive Global EDF theory in real-time field. > Each VCPU can have a dedicated period and budget. > While scheduled, a VCPU burns its budget. > A VCPU has its budget replenished at the beginning of each of its periods; > The VCPU discards its unused budget at the end of each of its periods. > If a VCPU runs out of budget in a period, it has to wait until next period. > The mechanism of how to burn a VCPU's budget depends on the server mechanism > implemented for each VCPU. > > Server mechanism: a VCPU is implemented as a deferable server. > When a VCPU has a task running on it, its budget is continuously > burned; > When a VCPU has no task but with budget left, its budget is > preserved. > > Priority scheme: Preemptive Global Earliest Deadline First (gEDF). > At any scheduling point, the VCPU with earliest deadline has highest > priority. > > Queue scheme: A global runqueue for each CPU pool. > The runqueue holds all runnable VCPUs. > VCPUs in the runqueue are divided into two parts: with and without budget. > At each part, VCPUs are sorted based on EDF priority scheme. > > Scheduling quanta: 1 ms; but accounting the budget is in microsecond. > > Note: cpumask and cpupool is supported. > > This is still in the development phase. > > Signed-off-by: Sisu Xi <xisisu@gmail.com> > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> Some notes from a hypervisor point of view. > --- > xen/common/Makefile | 1 + > xen/common/sched_rt.c | 984 +++++++++++++++++++++++++++++++++++++++++++ > xen/common/schedule.c | 1 + > xen/include/public/domctl.h | 19 + > xen/include/xen/sched-if.h | 2 +- > 5 files changed, 1006 insertions(+), 1 deletion(-) > create mode 100644 xen/common/sched_rt.c > > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 3683ae3..5a23aa4 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -26,6 +26,7 @@ obj-y += sched_credit.o > obj-y += sched_credit2.o > obj-y += sched_sedf.o > obj-y += sched_arinc653.o > +obj-y += sched_rt.o > obj-y += schedule.o > obj-y += shutdown.o > obj-y += softirq.o > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > new file mode 100644 > index 0000000..41543a2 > --- /dev/null > +++ b/xen/common/sched_rt.c > @@ -0,0 +1,984 @@ > +/****************************************************************************** > + * Preemptive Global Earliest Deadline First (EDF) scheduler for Xen > + * EDF scheduling is one of most popular real-time scheduling algorithm used in > + * embedded field. > + * > + * by Sisu Xi, 2013, Washington University in Saint Louis > + * and Meng Xu, 2014, University of Pennsylvania > + * > + * based on the code of credit Scheduler > + */ > + > +#include <xen/config.h> > +#include <xen/init.h> > +#include <xen/lib.h> > +#include <xen/sched.h> > +#include <xen/domain.h> > +#include <xen/delay.h> > +#include <xen/event.h> > +#include <xen/time.h> > +#include <xen/perfc.h> > +#include <xen/sched-if.h> > +#include <xen/softirq.h> > +#include <asm/atomic.h> > +#include <xen/errno.h> > +#include <xen/trace.h> > +#include <xen/cpu.h> > +#include <xen/keyhandler.h> > +#include <xen/trace.h> > +#include <xen/guest_access.h> > + > +/* > + * TODO: > + * > + * Migration compensation and resist like credit2 to better use cache; > + * Lock Holder Problem, using yield? > + * Self switch problem: VCPUs of the same domain may preempt each other; > + */ > + > +/* > + * Design: > + * > + * This scheduler follows the Preemptive Global EDF theory in real-time field. > + * Each VCPU can have a dedicated period and budget. > + * While scheduled, a VCPU burns its budget. > + * A VCPU has its budget replenished at the beginning of each of its periods; > + * The VCPU discards its unused budget at the end of each of its periods. > + * If a VCPU runs out of budget in a period, it has to wait until next period. > + * The mechanism of how to burn a VCPU's budget depends on the server mechanism > + * implemented for each VCPU. > + * > + * Server mechanism: a VCPU is implemented as a deferable server. > + * When a VCPU has a task running on it, its budget is continuously burned; > + * When a VCPU has no task but with budget left, its budget is preserved. > + * > + * Priority scheme: Preemptive Global Earliest Deadline First (gEDF). > + * At any scheduling point, the VCPU with earliest deadline has highest priority. > + * > + * Queue scheme: A global runqueue for each CPU pool. > + * The runqueue holds all runnable VCPUs. > + * VCPUs in the runqueue are divided into two parts: with and without remaining budget. > + * At each part, VCPUs are sorted based on EDF priority scheme. > + * > + * Scheduling quanta: 1 ms; but accounting the budget is in microsecond. > + * > + * Note: cpumask and cpupool is supported. > + */ > + > +/* > + * Locking: > + * Just like credit2, a global system lock is used to protect the RunQ. > + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus. > + * > + * The lock is already grabbed when calling wake/sleep/schedule/ functions in schedule.c > + * > + * The functions involes RunQ and needs to grab locks are: > + * dump, vcpu_insert, vcpu_remove, context_saved, > + */ > + > + > +/* > + * Default parameters in ms > + */ > +#define RT_DEFAULT_PERIOD 10 > +#define RT_DEFAULT_BUDGET 4 > + > +/* > + * Useful macros > + */ > +#define RT_PRIV(_ops) ((struct rt_private *)((_ops)->sched_data)) > +#define RT_VCPU(_vcpu) ((struct rt_vcpu *)(_vcpu)->sched_priv) > +#define RT_DOM(_dom) ((struct rt_dom *)(_dom)->sched_priv) > +#define RUNQ(_ops) (&RT_PRIV(_ops)->runq) > + > +/* > + * Flags > + */ > +#define __RT_scheduled 1 > +#define RT_scheduled (1<<__RT_scheduled) > +#define __RT_delayed_runq_add 2 > +#define RT_delayed_runq_add (1<<__RT_delayed_runq_add) > + > +/* > + * Used to printout debug information > + */ > +#define printtime() ( printk("%d : %3ld.%3ld : %-19s ", smp_processor_id(), NOW()/MILLISECS(1), NOW()%MILLISECS(1)/1000, __func__) ) NOW() is fairly expensive, and using it twice like this is going to lead to inaccurate times in the message. Anything like this should be XENLOG_DEBUG, and smp_processor_id() returns an unsigned number. I would suggest something more like this: #define printtime() \ ({ s_time_t now = NOW(); \ printk(XENLOG_DEBUG "%u : %3ld.%eld : %-19s\n", smp_processor_id(),\ now / MILLISECS(1), (now % MILLISECS(1)) / 1000, \ __func__); }) > + > +/* > + * Systme-wide private data, include a global RunQueue > + * The global lock is referenced by schedule_data.schedule_lock from all physical cpus. > + * It can be grabbed via vcpu_schedule_lock_irq() > + */ > +struct rt_private { > + spinlock_t lock; /* The global coarse grand lock */ > + struct list_head sdom; /* list of availalbe domains, used for dump */ > + struct list_head runq; /* Ordered list of runnable VMs */ > + cpumask_t cpus; /* cpumask_t of available physical cpus */ > + cpumask_t tickled; /* another cpu in the queue already ticked this one */ > +}; > + > +/* > + * Virtual CPU > + */ > +struct rt_vcpu { > + struct list_head runq_elem; /* On the runqueue list */ > + struct list_head sdom_elem; /* On the domain VCPU list */ > + > + /* Up-pointers */ > + struct rt_dom *sdom; > + struct vcpu *vcpu; > + > + /* VCPU parameters, in milliseconds */ > + s_time_t period; > + s_time_t budget; > + > + /* VCPU current infomation */ > + long cur_budget; /* current budget in microseconds */ Can budget be negative ? > + s_time_t last_start; /* last start time, used to calculate budget */ > + s_time_t cur_deadline; /* current deadline, used to do EDF */ > + unsigned flags; /* mark __RT_scheduled, etc.. */ > +}; > + > +/* > + * Domain > + */ > +struct rt_dom { > + struct list_head vcpu; /* link its VCPUs */ > + struct list_head sdom_elem; /* link list on rt_priv */ > + struct domain *dom; /* pointer to upper domain */ > +}; > + > +/* > + * RunQueue helper functions > + */ > +static int > +__vcpu_on_runq(struct rt_vcpu *svc) > +{ > + return !list_empty(&svc->runq_elem); > +} > + > +static struct rt_vcpu * > +__runq_elem(struct list_head *elem) > +{ > + return list_entry(elem, struct rt_vcpu, runq_elem); > +} > + > +static inline void > +__runq_remove(struct rt_vcpu *svc) > +{ > + if ( __vcpu_on_runq(svc) ) > + list_del_init(&svc->runq_elem); > +} > + > +/* > + * Insert a vcpu in the RunQ basing on vcpu's deadline: > + * vcpus with shorter deadline are inserted first. > + * EDF schedule policy: vcpu with smaller deadline has higher priority; > + * When vcpus have the same deadline, insert the current one at the head of these vcpus. > + */ > +static void > +__runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + The line above this has trailing whitespace, which you should clean up > + ASSERT( spin_is_locked(per_cpu(schedule_data, svc->vcpu->processor).schedule_lock) ); Beware assertions involving spin_is_locked(). It checks that someone somewhere has the schedule lock held, not necessarily that this current pcpu is holding the schedule lock. Having the assertion is better than not having it, but do be aware that it doesn't catch every case you might expect it to. > + > + if ( __vcpu_on_runq(svc) ) > + return; > + > + list_for_each(iter, runq) { Coding style - braces on next line. > + struct rt_vcpu * iter_svc = __runq_elem(iter); > + > + if ( svc->cur_budget > 0 ) { /* svc still has budget */ > + if ( iter_svc->cur_budget == 0 || > + svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + } else { /* svc has no budget */ > + if ( iter_svc->cur_budget == 0 && > + svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + } > + } > + > + list_add_tail(&svc->runq_elem, iter); > +} > + > + > +/* > + * Debug related code, dump vcpu/cpu information > + */ > +static void > +rt_dump_vcpu(struct rt_vcpu *svc) > +{ > + if ( svc == NULL ) { > + printk("NULL!\n"); > + return; > + } > +#define cpustr keyhandler_scratch char *cpustr = keyhandler_scratch; Less macroism and more typesafety. > + cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity); > + printk("[%5d.%-2d] cpu %d, (%"PRId64", %"PRId64"), cur_b=%"PRId64" cur_d=%"PRId64" last_start=%"PRId64" onR=%d runnable=%d cpu_hard_affinity=%s ", > + svc->vcpu->domain->domain_id, > + svc->vcpu->vcpu_id, > + svc->vcpu->processor, > + svc->period, > + svc->budget, > + svc->cur_budget, > + svc->cur_deadline, > + svc->last_start, > + __vcpu_on_runq(svc), > + vcpu_runnable(svc->vcpu), > + cpustr); > + memset(cpustr, 0, sizeof(char)*1024); > + cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool)); > + printk("cpupool=%s\n", cpustr); > +#undef cpustr > +} > + > +static void > +rt_dump_pcpu(const struct scheduler *ops, int cpu) cpu ids should be unsigned. > +{ > + struct rt_vcpu *svc = RT_VCPU(curr_on_cpu(cpu)); > + > + printtime(); > + rt_dump_vcpu(svc); > +} > + > +/* > + * should not need lock here. only showing stuff > + */ > +static void > +rt_dump(const struct scheduler *ops) > +{ > + struct list_head *iter_sdom, *iter_svc, *runq, *iter; > + struct rt_private *prv = RT_PRIV(ops); > + struct rt_vcpu *svc; > + int cpu = 0; > + int loop = 0; > + > + printtime(); > + printk("Priority Scheme: EDF\n"); > + > + printk("PCPU info: \n"); > + for_each_cpu(cpu, &prv->cpus) { > + rt_dump_pcpu(ops, cpu); > + } > + > + printk("Global RunQueue info: \n"); > + loop = 0; > + runq = RUNQ(ops); > + list_for_each( iter, runq ) { > + svc = __runq_elem(iter); > + printk("\t%3d: ", ++loop); > + rt_dump_vcpu(svc); > + } > + > + printk("Domain info: \n"); > + loop = 0; > + list_for_each( iter_sdom, &prv->sdom ) { > + struct rt_dom *sdom; > + sdom = list_entry(iter_sdom, struct rt_dom, sdom_elem); > + printk("\tdomain: %d\n", sdom->dom->domain_id); > + > + list_for_each( iter_svc, &sdom->vcpu ) { > + svc = list_entry(iter_svc, struct rt_vcpu, sdom_elem); > + printk("\t\t%3d: ", ++loop); > + rt_dump_vcpu(svc); > + } > + } > + > + printk("\n"); > +} > + > +/* > + * Init/Free related code > + */ > +static int > +rt_init(struct scheduler *ops) > +{ > + struct rt_private *prv; > + > + prv = xzalloc(struct rt_private); These 3 lines can be condensed into 1. > + if ( prv == NULL ) { > + printk("xzalloc failed at rt_private\n"); SCHED_OP(..., init) will bail on error. No need to printk() as well. > + return -ENOMEM; > + } > + > + ops->sched_data = prv; > + spin_lock_init(&prv->lock); > + INIT_LIST_HEAD(&prv->sdom); > + INIT_LIST_HEAD(&prv->runq); > + cpumask_clear(&prv->tickled); > + cpumask_clear(&prv->cpus); Using the xzalloc(), you don't need to re-0 elements, so these cpumask_clear()s can be dropped. > + > + printtime(); > + printk("\n"); > + > + return 0; > +} > + > +static void > +rt_deinit(const struct scheduler *ops) > +{ > + struct rt_private *prv; > + > + printtime(); > + printk("\n"); > + > + prv = RT_PRIV(ops); this line can be joined with the declaration of *prv. > + if ( prv ) > + xfree(prv); xfree() works just like free(). Please drop the NULL check. > +} > + > +/* > + * point per_cpu spinlock to the global system lock; all cpu have same global system lock > + */ > +static void * > +rt_alloc_pdata(const struct scheduler *ops, int cpu) Eww. Not your fault, but struct scheduler needs its idea of cpus to turn less signed. > +{ > + struct rt_private *prv = RT_PRIV(ops); > + > + cpumask_set_cpu(cpu, &prv->cpus); > + > + per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; > + > + printtime(); > + printk("%s total cpus: %d", __FUNCTION__, cpumask_weight(&prv->cpus)); > + return (void *)1; This is going to end in disaster, as this_cpu(schedule_data).sched_priv now contains a bogus pointer. > +} > + > +static void > +rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) > +{ > + struct rt_private * prv = RT_PRIV(ops); > + cpumask_clear_cpu(cpu, &prv->cpus); > + printtime(); > + printk("%s cpu=%d\n", __FUNCTION__, cpu); > +} > + > +static void * > +rt_alloc_domdata(const struct scheduler *ops, struct domain *dom) > +{ > + unsigned long flags; > + struct rt_dom *sdom; > + struct rt_private * prv = RT_PRIV(ops); > + > + printtime(); > + printk("dom=%d\n", dom->domain_id); > + > + sdom = xzalloc(struct rt_dom); > + if ( sdom == NULL ) { > + printk("%s, xzalloc failed\n", __func__); > + return NULL; > + } > + > + INIT_LIST_HEAD(&sdom->vcpu); > + INIT_LIST_HEAD(&sdom->sdom_elem); > + sdom->dom = dom; > + > + /* spinlock here to insert the dom */ > + spin_lock_irqsave(&prv->lock, flags); > + list_add_tail(&sdom->sdom_elem, &(prv->sdom)); > + spin_unlock_irqrestore(&prv->lock, flags); > + > + return (void *)sdom; > +} > + > +static void > +rt_free_domdata(const struct scheduler *ops, void *data) > +{ > + unsigned long flags; > + struct rt_dom *sdom = data; > + struct rt_private * prv = RT_PRIV(ops); Please be consistent with *s in pointers declarations. Prevailing style is lacking a space. > + > + printtime(); > + printk("dom=%d\n", sdom->dom->domain_id); > + > + spin_lock_irqsave(&prv->lock, flags); > + list_del_init(&sdom->sdom_elem); > + spin_unlock_irqrestore(&prv->lock, flags); Why irqsave ? > + xfree(data); > +} > + > +static int > +rt_dom_init(const struct scheduler *ops, struct domain *dom) > +{ > + struct rt_dom *sdom; > + > + printtime(); > + printk("dom=%d\n", dom->domain_id); > + > + /* IDLE Domain does not link on rt_private */ > + if ( is_idle_domain(dom) ) { return 0; } Coding style - extranious braces and statement on same line as if. > + > + sdom = rt_alloc_domdata(ops, dom); > + if ( sdom == NULL ) { > + printk("%s, failed\n", __func__); > + return -ENOMEM; > + } > + dom->sched_priv = sdom; > + > + return 0; > +} > + > +static void > +rt_dom_destroy(const struct scheduler *ops, struct domain *dom) > +{ > + printtime(); > + printk("dom=%d\n", dom->domain_id); > + > + rt_free_domdata(ops, RT_DOM(dom)); > +} > + > +static void * > +rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) > +{ > + struct rt_vcpu *svc; > + s_time_t now = NOW(); > + long count; > + > + /* Allocate per-VCPU info */ > + svc = xzalloc(struct rt_vcpu); > + if ( svc == NULL ) { > + printk("%s, xzalloc failed\n", __func__); > + return NULL; > + } > + > + INIT_LIST_HEAD(&svc->runq_elem); > + INIT_LIST_HEAD(&svc->sdom_elem); > + svc->flags = 0U; > + svc->sdom = dd; > + svc->vcpu = vc; > + svc->last_start = 0; /* init last_start is 0 */ > + > + svc->period = RT_DEFAULT_PERIOD; > + if ( !is_idle_vcpu(vc) && vc->domain->domain_id != 0 ) { > + svc->budget = RT_DEFAULT_BUDGET; > + } else { > + svc->budget = RT_DEFAULT_PERIOD; /* give vcpus of dom0 100% utilization */ > + } > + > + count = (now/MILLISECS(svc->period)) + 1; Coding style = spaces around binary operators. > + /* sync all VCPU's start time to 0 */ > + svc->cur_deadline += count*MILLISECS(svc->period); > + > + svc->cur_budget = svc->budget*1000; /* counting in microseconds level */ > + /* Debug only: dump new vcpu's info */ > + printtime(); > + rt_dump_vcpu(svc); > + > + return svc; > +} > + > +static void > +rt_free_vdata(const struct scheduler *ops, void *priv) > +{ > + struct rt_vcpu *svc = priv; > + > + /* Debug only: dump freed vcpu's info */ > + printtime(); > + rt_dump_vcpu(svc); > + xfree(svc); > +} > + > +static void > +rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu *svc = RT_VCPU(vc); > + > + /* Debug only: dump info of vcpu to insert */ > + printtime(); > + rt_dump_vcpu(svc); > + > + /* IDLE VCPU not allowed on RunQ */ > + if ( is_idle_vcpu(vc) ) > + return; > + > + list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu); /* add to dom vcpu list */ > +} > + > +static void > +rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + struct rt_dom * const sdom = svc->sdom; Why are the pointers themselves const? > + > + printtime(); > + rt_dump_vcpu(svc); > + > + BUG_ON( sdom == NULL ); > + BUG_ON( __vcpu_on_runq(svc) ); > + > + if ( !is_idle_vcpu(vc) ) { > + list_del_init(&svc->sdom_elem); > + } > +} > + > +/* > + * Pick a valid CPU for the vcpu vc > + * Valid CPU of a vcpu is intesection of vcpu's affinity and available cpus > + */ > +static int > +rt_cpu_pick(const struct scheduler *ops, struct vcpu *vc) > +{ > + cpumask_t cpus; > + cpumask_t *online; > + int cpu; > + struct rt_private * prv = RT_PRIV(ops); > + > + online = cpupool_scheduler_cpumask(vc->domain->cpupool); > + cpumask_and(&cpus, &prv->cpus, online); > + cpumask_and(&cpus, &cpus, vc->cpu_hard_affinity); > + > + cpu = cpumask_test_cpu(vc->processor, &cpus) > + ? vc->processor > + : cpumask_cycle(vc->processor, &cpus); > + ASSERT( !cpumask_empty(&cpus) && cpumask_test_cpu(cpu, &cpus) ); > + > + return cpu; > +} > + > +/* > + * Implemented as deferrable server. > + * Different server mechanism has different implementation. > + * burn budget at microsecond level. > + */ > +static void > +burn_budgets(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) { > + s_time_t delta; > + unsigned int consume; > + long count = 0; > + > + /* first time called for this svc, update last_start */ > + if ( svc->last_start == 0 ) { > + svc->last_start = now; > + return; > + } > + > + /* don't burn budget for idle VCPU */ > + if ( is_idle_vcpu(svc->vcpu) ) { > + return; > + } > + > + /* don't burn budget for Domain-0, RT-Xen use only */ > + if ( svc->sdom->dom->domain_id == 0 ) { > + return; > + } > + > + /* update deadline info */ > + delta = now - svc->cur_deadline; > + if ( delta >= 0 ) { > + count = ( delta/MILLISECS(svc->period) ) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > + return; > + } > + > + delta = now - svc->last_start; > + if ( delta < 0 ) { > + printk("%s, delta = %ld for ", __func__, delta); > + rt_dump_vcpu(svc); > + svc->last_start = now; /* update last_start */ > + svc->cur_budget = 0; > + return; > + } > + > + if ( svc->cur_budget == 0 ) return; > + > + /* burn at microseconds level */ > + consume = ( delta/MICROSECS(1) ); > + if ( delta%MICROSECS(1) > MICROSECS(1)/2 ) consume++; > + > + svc->cur_budget -= consume; > + if ( svc->cur_budget < 0 ) svc->cur_budget = 0; > +} > + > +/* > + * RunQ is sorted. Pick first one within cpumask. If no one, return NULL > + * lock is grabbed before calling this function > + */ > +static struct rt_vcpu * > +__runq_pick(const struct scheduler *ops, cpumask_t mask) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + struct rt_vcpu *svc = NULL; > + struct rt_vcpu *iter_svc = NULL; > + cpumask_t cpu_common; > + cpumask_t *online; > + struct rt_private * prv = RT_PRIV(ops); > + > + list_for_each(iter, runq) { > + iter_svc = __runq_elem(iter); > + > + /* mask is intersection of cpu_hard_affinity and cpupool and priv->cpus */ > + online = cpupool_scheduler_cpumask(iter_svc->vcpu->domain->cpupool); > + cpumask_and(&cpu_common, online, &prv->cpus); > + cpumask_and(&cpu_common, &cpu_common, iter_svc->vcpu->cpu_hard_affinity); > + cpumask_and(&cpu_common, &mask, &cpu_common); > + if ( cpumask_empty(&cpu_common) ) > + continue; > + > + if ( iter_svc->cur_budget <= 0 ) > + continue; > + > + svc = iter_svc; > + break; > + } > + > + return svc; > +} > + > +/* > + * Update vcpu's budget and sort runq by insert the modifed vcpu back to runq > + * lock is grabbed before calling this function > + */ > +static void > +__repl_update(const struct scheduler *ops, s_time_t now) > +{ > + struct list_head *runq = RUNQ(ops); > + struct list_head *iter; > + struct list_head *tmp; > + struct rt_vcpu *svc = NULL; > + > + s_time_t diff; > + long count; > + > + list_for_each_safe(iter, tmp, runq) { > + svc = __runq_elem(iter); > + > + diff = now - svc->cur_deadline; > + if ( diff > 0 ) { > + count = (diff/MILLISECS(svc->period)) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > + __runq_remove(svc); > + __runq_insert(ops, svc); > + } > + } > +} > + > +/* > + * schedule function for rt scheduler. > + * The lock is already grabbed in schedule.c, no need to lock here > + */ > +static struct task_slice > +rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled) > +{ > + const int cpu = smp_processor_id(); > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * const scurr = RT_VCPU(current); > + struct rt_vcpu * snext = NULL; > + struct task_slice ret; > + > + /* clear ticked bit now that we've been scheduled */ > + if ( cpumask_test_cpu(cpu, &prv->tickled) ) > + cpumask_clear_cpu(cpu, &prv->tickled); > + > + /* burn_budget would return for IDLE VCPU */ > + burn_budgets(ops, scurr, now); > + > + __repl_update(ops, now); > + > + if ( tasklet_work_scheduled ) { > + snext = RT_VCPU(idle_vcpu[cpu]); > + } else { > + cpumask_t cur_cpu; > + cpumask_clear(&cur_cpu); > + cpumask_set_cpu(cpu, &cur_cpu); > + snext = __runq_pick(ops, cur_cpu); > + if ( snext == NULL ) > + snext = RT_VCPU(idle_vcpu[cpu]); > + > + /* if scurr has higher priority and budget, still pick scurr */ > + if ( !is_idle_vcpu(current) && > + vcpu_runnable(current) && > + scurr->cur_budget > 0 && > + ( is_idle_vcpu(snext->vcpu) || > + scurr->cur_deadline <= snext->cur_deadline ) ) { > + snext = scurr; > + } > + } > + > + if ( snext != scurr && > + !is_idle_vcpu(current) && > + vcpu_runnable(current) ) { > + set_bit(__RT_delayed_runq_add, &scurr->flags); > + } > + > + snext->last_start = now; > + ret.migrated = 0; > + if ( !is_idle_vcpu(snext->vcpu) ) { > + if ( snext != scurr ) { > + __runq_remove(snext); > + set_bit(__RT_scheduled, &snext->flags); > + } > + if ( snext->vcpu->processor != cpu ) { > + snext->vcpu->processor = cpu; > + ret.migrated = 1; > + } > + } > + > + ret.time = MILLISECS(1); > + ret.task = snext->vcpu; > + > + return ret; > +} > + > +/* > + * Remove VCPU from RunQ > + * The lock is already grabbed in schedule.c, no need to lock here > + */ > +static void > +rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + > + BUG_ON( is_idle_vcpu(vc) ); > + > + if ( curr_on_cpu(vc->processor) == vc ) { > + cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); > + return; > + } > + > + if ( __vcpu_on_runq(svc) ) { > + __runq_remove(svc); > + } > + > + clear_bit(__RT_delayed_runq_add, &svc->flags); > +} > + > +/* > + * Pick a vcpu on a cpu to kick out to place the running candidate > + * Called by wake() and context_saved() > + * We have a running candidate here, the kick logic is: > + * Among all the cpus that are within the cpu affinity > + * 1) if the new->cpu is idle, kick it. This could benefit cache hit > + * 2) if there are any idle vcpu, kick it. > + * 3) now all pcpus are busy, among all the running vcpus, pick lowest priority one > + * if snext has higher priority, kick it. > + * > + * TODO: > + * 1) what if these two vcpus belongs to the same domain? > + * replace a vcpu belonging to the same domain introduces more overhead > + * > + * lock is grabbed before calling this function > + */ > +static void > +runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) > +{ > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * scheduled = NULL; /* lowest priority scheduled */ > + struct rt_vcpu * iter_svc; > + struct vcpu * iter_vc; > + int cpu = 0; > + cpumask_t not_tickled; /* not tickled cpus */ > + cpumask_t *online; > + > + if ( new == NULL || is_idle_vcpu(new->vcpu) ) return; > + > + online = cpupool_scheduler_cpumask(new->vcpu->domain->cpupool); > + cpumask_and(¬_tickled, online, &prv->cpus); > + cpumask_and(¬_tickled, ¬_tickled, new->vcpu->cpu_hard_affinity); > + cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); > + > + /* 1) if new's previous cpu is idle, kick it for cache benefit */ > + if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) ) { > + cpumask_set_cpu(new->vcpu->processor, &prv->tickled); > + cpu_raise_softirq(new->vcpu->processor, SCHEDULE_SOFTIRQ); > + return; > + } > + > + /* 2) if there are any idle pcpu, kick it */ > + /* The same loop also find the one with lowest priority */ > + for_each_cpu(cpu, ¬_tickled) { > + iter_vc = curr_on_cpu(cpu); > + if ( is_idle_vcpu(iter_vc) ) { > + cpumask_set_cpu(cpu, &prv->tickled); > + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > + return; > + } > + iter_svc = RT_VCPU(iter_vc); > + if ( scheduled == NULL || > + iter_svc->cur_deadline > scheduled->cur_deadline ) { > + scheduled = iter_svc; > + } > + } > + > + /* 3) candicate has higher priority, kick out the lowest priority vcpu */ > + if ( scheduled != NULL && new->cur_deadline < scheduled->cur_deadline ) { > + cpumask_set_cpu(scheduled->vcpu->processor, &prv->tickled); > + cpu_raise_softirq(scheduled->vcpu->processor, SCHEDULE_SOFTIRQ); > + } > + return; > +} > + > +/* > + * Should always wake up runnable vcpu, put it back to RunQ. > + * Check priority to raise interrupt > + * The lock is already grabbed in schedule.c, no need to lock here > + * TODO: what if these two vcpus belongs to the same domain? > + */ > +static void > +rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * const svc = RT_VCPU(vc); > + s_time_t diff; > + s_time_t now = NOW(); > + long count = 0; > + struct rt_private * prv = RT_PRIV(ops); > + struct rt_vcpu * snext = NULL; /* highest priority on RunQ */ > + > + BUG_ON( is_idle_vcpu(vc) ); > + > + if ( unlikely(curr_on_cpu(vc->processor) == vc) ) return; > + > + /* on RunQ, just update info is ok */ > + if ( unlikely(__vcpu_on_runq(svc)) ) return; > + > + /* If context hasn't been saved yet, set flag so it will add later */ > + if ( unlikely(test_bit(__RT_scheduled, &svc->flags)) ) { > + set_bit(__RT_delayed_runq_add, &svc->flags); > + return; > + } > + > + /* update deadline info */ > + diff = now - svc->cur_deadline; > + if ( diff >= 0 ) { > + count = ( diff/MILLISECS(svc->period) ) + 1; > + svc->cur_deadline += count * MILLISECS(svc->period); > + svc->cur_budget = svc->budget * 1000; > + } > + > + __runq_insert(ops, svc); > + __repl_update(ops, now); > + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL valid cpus */ > + runq_tickle(ops, snext); > + > + return; > +} > + > +/* > + * scurr has finished context switch, insert it back to the RunQ, > + * and then pick the highest priority vcpu from runq to run > + */ > +static void > +rt_context_saved(const struct scheduler *ops, struct vcpu *vc) > +{ > + struct rt_vcpu * svc = RT_VCPU(vc); > + struct rt_vcpu * snext = NULL; > + struct rt_private * prv = RT_PRIV(ops); > + spinlock_t *lock; > + > + clear_bit(__RT_scheduled, &svc->flags); > + if ( is_idle_vcpu(vc) ) return; > + > + lock = vcpu_schedule_lock_irq(vc); > + if ( test_and_clear_bit(__RT_delayed_runq_add, &svc->flags) && > + likely(vcpu_runnable(vc)) ) { > + __runq_insert(ops, svc); > + __repl_update(ops, NOW()); > + snext = __runq_pick(ops, prv->cpus); /* pick snext from ALL cpus */ > + runq_tickle(ops, snext); > + } > + vcpu_schedule_unlock_irq(lock, vc); > +} > + > +/* > + * set/get each vcpu info of each domain > + */ > +static int > +rt_dom_cntl(const struct scheduler *ops, struct domain *d, struct xen_domctl_scheduler_op *op) > +{ > + xen_domctl_sched_rt_params_t local_sched; > + struct rt_dom * const sdom = RT_DOM(d); > + struct list_head *iter; > + int vcpu_index = 0; > + int rc = -EINVAL; > + > + switch ( op->cmd ) > + { > + case XEN_DOMCTL_SCHEDOP_getinfo: > + /* for debug use, whenever adjust Dom0 parameter, do global dump */ > + if ( d->domain_id == 0 ) { > + rt_dump(ops); > + } > + > + local_sched.num_vcpus = 0; > + list_for_each( iter, &sdom->vcpu ) { > + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); > + > + ASSERT(vcpu_index < XEN_LEGACY_MAX_VCPUS); > + local_sched.vcpus[vcpu_index].budget = svc->budget; > + local_sched.vcpus[vcpu_index].period = svc->period; > + > + vcpu_index++; > + } > + local_sched.num_vcpus = vcpu_index; > + copy_to_guest(op->u.rt.schedule, &local_sched, 1); > + rc = 0; > + break; > + case XEN_DOMCTL_SCHEDOP_putinfo: > + copy_from_guest(&local_sched, op->u.rt.schedule, 1); > + list_for_each( iter, &sdom->vcpu ) { > + struct rt_vcpu * svc = list_entry(iter, struct rt_vcpu, sdom_elem); > + > + if ( local_sched.vcpu_index == svc->vcpu->vcpu_id ) { /* adjust per VCPU parameter */ > + vcpu_index = local_sched.vcpu_index; > + > + if ( vcpu_index < 0 || vcpu_index > XEN_LEGACY_MAX_VCPUS) { > + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d\n", > + vcpu_index); > + }else{ > + printk("XEN_DOMCTL_SCHEDOP_putinfo: vcpu_index=%d, period=%"PRId64", budget=%"PRId64"\n", > + vcpu_index, local_sched.vcpus[vcpu_index].period, > + local_sched.vcpus[vcpu_index].budget); > + } > + > + svc->period = local_sched.vcpus[vcpu_index].period; > + svc->budget = local_sched.vcpus[vcpu_index].budget; > + > + break; > + } > + } > + rc = 0; > + break; > + } > + > + return rc; > +} > + > +static struct rt_private _rt_priv; > + > +const struct scheduler sched_rt_def = { > + .name = "SMP RT Scheduler", > + .opt_name = "rt", > + .sched_id = XEN_SCHEDULER_RT, > + .sched_data = &_rt_priv, > + > + .dump_cpu_state = rt_dump_pcpu, > + .dump_settings = rt_dump, > + .init = rt_init, > + .deinit = rt_deinit, > + .alloc_pdata = rt_alloc_pdata, > + .free_pdata = rt_free_pdata, > + .alloc_domdata = rt_alloc_domdata, > + .free_domdata = rt_free_domdata, > + .init_domain = rt_dom_init, > + .destroy_domain = rt_dom_destroy, > + .alloc_vdata = rt_alloc_vdata, > + .free_vdata = rt_free_vdata, > + .insert_vcpu = rt_vcpu_insert, > + .remove_vcpu = rt_vcpu_remove, > + > + .adjust = rt_dom_cntl, > + > + .pick_cpu = rt_cpu_pick, > + .do_schedule = rt_schedule, > + .sleep = rt_vcpu_sleep, > + .wake = rt_vcpu_wake, > + .context_saved = rt_context_saved, > + > + .yield = NULL, > + .migrate = NULL, > +}; > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index e9eb0bc..2d13966 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -68,6 +68,7 @@ static const struct scheduler *schedulers[] = { > &sched_sedf_def, > &sched_credit_def, > &sched_credit2_def, > + &sched_rt_def, > &sched_arinc653_def, > }; > > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > index 5b11bbf..d1a8201 100644 > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -339,6 +339,20 @@ struct xen_domctl_max_vcpus { > typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t; > DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); > > +/* > + * This structure is used to pass to rt scheduler from a > + * privileged domain to Xen > + */ > +struct xen_domctl_sched_rt_params { > + struct { > + signed long period; /* s_time_t type */ > + signed long budget; No implicitly-width'd types in the public headers. int64_t's should do. > + } vcpus[XEN_LEGACY_MAX_VCPUS]; This is a limit which should be avoided in newly-added public API, as it will force a change at some point in the future. > + uint16_t num_vcpus; > + uint16_t vcpu_index; Can you explain what the purpose of these are? As a gut feel, I think that these two ints should be part of the struct xen_domctl_sched_rt below, with a guest handle pointing to an array of { period, budget } structures, which avoids the LEGACY_MAX limit. > +}; > +typedef struct xen_domctl_sched_rt_params xen_domctl_sched_rt_params_t; > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_sched_rt_params_t); > > /* XEN_DOMCTL_scheduler_op */ > /* Scheduler types. */ > @@ -346,6 +360,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); > #define XEN_SCHEDULER_CREDIT 5 > #define XEN_SCHEDULER_CREDIT2 6 > #define XEN_SCHEDULER_ARINC653 7 > +#define XEN_SCHEDULER_RT 8 > + > /* Set or get info? */ > #define XEN_DOMCTL_SCHEDOP_putinfo 0 > #define XEN_DOMCTL_SCHEDOP_getinfo 1 > @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { > struct xen_domctl_sched_credit2 { > uint16_t weight; > } credit2; > + struct xen_domctl_sched_rt{ > + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) schedule; > + } rt; Thinking more generally, is rt an appropriate name here? It seems a little generic, as there are several classes of realtime schedulers employing different semantics and parameters. > nul > } u; > }; > typedef struct xen_domctl_scheduler_op xen_domctl_scheduler_op_t; > diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h > index 4164dff..e452d32 100644 > --- a/xen/include/xen/sched-if.h > +++ b/xen/include/xen/sched-if.h > @@ -169,7 +169,7 @@ extern const struct scheduler sched_sedf_def; > extern const struct scheduler sched_credit_def; > extern const struct scheduler sched_credit2_def; > extern const struct scheduler sched_arinc653_def; > - > +extern const struct scheduler sched_rt_def; If you keep the double newline, this diff gets smaller. ~Andrew > > struct cpupool > { ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 14:37 ` Andrew Cooper @ 2014-07-11 15:21 ` Dario Faggioli 2014-07-11 15:40 ` Andrew Cooper 0 siblings, 1 reply; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 15:21 UTC (permalink / raw) To: Andrew Cooper Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 3441 bytes --] On ven, 2014-07-11 at 15:37 +0100, Andrew Cooper wrote: > On 11/07/14 05:49, Meng Xu wrote: > > +/* > > + * Virtual CPU > > + */ > > +struct rt_vcpu { > > + struct list_head runq_elem; /* On the runqueue list */ > > + struct list_head sdom_elem; /* On the domain VCPU list */ > > + > > + /* Up-pointers */ > > + struct rt_dom *sdom; > > + struct vcpu *vcpu; > > + > > + /* VCPU parameters, in milliseconds */ > > + s_time_t period; > > + s_time_t budget; > > + > > + /* VCPU current infomation */ > > + long cur_budget; /* current budget in microseconds */ > > Can budget be negative ? > It depends on how you implement the core of the algorithm. Current SEDF uses a cputime counter, which always increase, and checks whether budget is being overcome with something lien (cputime >= budget). Meng's implementation here does the opposite, initializes cur_budget with budget, and subtract actual execution time from there, until it reaches zero or, in case of overrun, below zero. So, yes, if we stick with this version of things (which is perfectly fine), I think cur_budget should be allowed to go negative. However, I'd use s_time_t for it it, rather than long. > > #define XEN_SCHEDULER_CREDIT 5 > > #define XEN_SCHEDULER_CREDIT2 6 > > #define XEN_SCHEDULER_ARINC653 7 > > +#define XEN_SCHEDULER_RT 8 > > + > > /* Set or get info? */ > > #define XEN_DOMCTL_SCHEDOP_putinfo 0 > > #define XEN_DOMCTL_SCHEDOP_getinfo 1 > > @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { > > struct xen_domctl_sched_credit2 { > > uint16_t weight; > > } credit2; > > + struct xen_domctl_sched_rt{ > > + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) schedule; > > + } rt; > > Thinking more generally, is rt an appropriate name here? It seems a > little generic, as there are several classes of realtime schedulers > employing different semantics and parameters. > That's true. The challenge here is finding a name which is representative enough of the specific characteristics of the scheduler, without it being known only in real-time research rooms! :-O I'd stick with sched_rt.c for the filename, as it's an implementation of EDF, after all, on top of which (read: in the same file) you can potentially implement a bunch of different scheduling algorithms and policies. This particular one is called "Deferrable Server" (DS). Since it's on top of EDF, which is a dynamic priority algorithm, it is sometimes called "Dynamic Deferrable Server" (DDS). It's also part of a class of algorithms known as "Resource Reservation" algorithms, which is how (well, one of the ways with which) the RT community refers to algos with a budgetting scheme inside it. So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS (Resource Reservation Deferrable Server)? Just DS (Deferrable Server)? I'm not sure I like much any of these... What was it that you had in mind when you said "there are several classes of realtime schedulers employing different semantics and parameters" ? Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 15:21 ` Dario Faggioli @ 2014-07-11 15:40 ` Andrew Cooper 2014-07-11 15:48 ` Dario Faggioli 0 siblings, 1 reply; 47+ messages in thread From: Andrew Cooper @ 2014-07-11 15:40 UTC (permalink / raw) To: Dario Faggioli Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb On 11/07/14 16:21, Dario Faggioli wrote: >>> #define XEN_SCHEDULER_CREDIT 5 >>> #define XEN_SCHEDULER_CREDIT2 6 >>> #define XEN_SCHEDULER_ARINC653 7 >>> +#define XEN_SCHEDULER_RT 8 >>> + >>> /* Set or get info? */ >>> #define XEN_DOMCTL_SCHEDOP_putinfo 0 >>> #define XEN_DOMCTL_SCHEDOP_getinfo 1 >>> @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { >>> struct xen_domctl_sched_credit2 { >>> uint16_t weight; >>> } credit2; >>> + struct xen_domctl_sched_rt{ >>> + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_params_t) schedule; >>> + } rt; >> >> Thinking more generally, is rt an appropriate name here? It seems a >> little generic, as there are several classes of realtime schedulers >> employing different semantics and parameters. >> > That's true. The challenge here is finding a name which is > representative enough of the specific characteristics of the scheduler, > without it being known only in real-time research rooms! :-O > > I'd stick with sched_rt.c for the filename, as it's an implementation of > EDF, after all, on top of which (read: in the same file) you can > potentially implement a bunch of different scheduling algorithms and > policies. > > This particular one is called "Deferrable Server" (DS). Since it's on > top of EDF, which is a dynamic priority algorithm, it is sometimes > called "Dynamic Deferrable Server" (DDS). It's also part of a class of > algorithms known as "Resource Reservation" algorithms, which is how > (well, one of the ways with which) the RT community refers to algos with > a budgetting scheme inside it. > > So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS (Resource > Reservation Deferrable Server)? Just DS (Deferrable Server)? > > I'm not sure I like much any of these... > > What was it that you had in mind when you said "there are several > classes of realtime schedulers employing different semantics and > parameters" ? > > Dario > Hmm - naming things is difficult. How about bob? My concern is that naming it "rt" seems too generic, and will cause confusion of a) what type of realtime scheduler it is, and b) further problems if someone wants to come and implement a different realtime scheduler in the future. XEN_SCHEDULER_RT_DS doesn't sound too bad. ~Andrew ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 15:40 ` Andrew Cooper @ 2014-07-11 15:48 ` Dario Faggioli 2014-07-16 17:05 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 15:48 UTC (permalink / raw) To: Andrew Cooper Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 1279 bytes --] On ven, 2014-07-11 at 16:40 +0100, Andrew Cooper wrote: > On 11/07/14 16:21, Dario Faggioli wrote: > > So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS (Resource > > Reservation Deferrable Server)? Just DS (Deferrable Server)? > > > > I'm not sure I like much any of these... > > > > What was it that you had in mind when you said "there are several > > classes of realtime schedulers employing different semantics and > > parameters" ? > Hmm - naming things is difficult. How about bob? > Well, if we go that route, I prefer Alice! :-D :-D > My concern is that naming it "rt" seems too generic, and will cause > confusion of a) what type of realtime scheduler it is, and b) further > problems if someone wants to come and implement a different realtime > scheduler in the future. > Totally understood and agreed. > XEN_SCHEDULER_RT_DS doesn't sound too bad. > It does not, indeed. Let's hear Meng, Sisu, and the other xen-devel folks... Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-11 15:48 ` Dario Faggioli @ 2014-07-16 17:05 ` Konrad Rzeszutek Wilk 2014-07-17 10:12 ` Meng Xu 0 siblings, 1 reply; 47+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-07-16 17:05 UTC (permalink / raw) To: Dario Faggioli Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, Andrew Cooper, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb On Fri, Jul 11, 2014 at 05:48:17PM +0200, Dario Faggioli wrote: > On ven, 2014-07-11 at 16:40 +0100, Andrew Cooper wrote: > > On 11/07/14 16:21, Dario Faggioli wrote: > > > > So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS (Resource > > > Reservation Deferrable Server)? Just DS (Deferrable Server)? > > > > > > I'm not sure I like much any of these... > > > > > > What was it that you had in mind when you said "there are several > > > classes of realtime schedulers employing different semantics and > > > parameters" ? > > > Hmm - naming things is difficult. How about bob? > > > Well, if we go that route, I prefer Alice! :-D :-D > > > My concern is that naming it "rt" seems too generic, and will cause > > confusion of a) what type of realtime scheduler it is, and b) further > > problems if someone wants to come and implement a different realtime > > scheduler in the future. > > > Totally understood and agreed. > > > XEN_SCHEDULER_RT_DS doesn't sound too bad. > > > It does not, indeed. Let's hear Meng, Sisu, and the other xen-devel > folks... DS9 :-) XEN_SCHEDULER_RT_PGEDF ? And then we can also have XEN_SCHEDULER_RT_CBS ? > > Thanks and Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-16 17:05 ` Konrad Rzeszutek Wilk @ 2014-07-17 10:12 ` Meng Xu 2014-07-17 15:12 ` Dario Faggioli 2014-07-18 18:40 ` Konrad Rzeszutek Wilk 0 siblings, 2 replies; 47+ messages in thread From: Meng Xu @ 2014-07-17 10:12 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, Andrew Cooper, Dario Faggioli, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 3545 bytes --] Hi Dario and Konrad, First, thank you very much for your detailed and very useful comments on this patch! I'm modifying it based on your comments now. (I'm sorry I didn't reply promptly because I was in travel these days and cannot always get access to network. :-() Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>于2014年7月16日星期三写道: > On Fri, Jul 11, 2014 at 05:48:17PM +0200, Dario Faggioli wrote: > > On ven, 2014-07-11 at 16:40 +0100, Andrew Cooper wrote: > > > On 11/07/14 16:21, Dario Faggioli wrote: > > > > > > So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS > (Resource > > > > Reservation Deferrable Server)? Just DS (Deferrable Server)? > > > > > > > > I'm not sure I like much any of these... > > > > > > > > What was it that you had in mind when you said "there are several > > > > classes of realtime schedulers employing different semantics and > > > > parameters" ? > > > > > Hmm - naming things is difficult. How about bob? > > > > > Well, if we go that route, I prefer Alice! :-D :-D > > > > > My concern is that naming it "rt" seems too generic, and will cause > > > confusion of a) what type of realtime scheduler it is, and b) further > > > problems if someone wants to come and implement a different realtime > > > scheduler in the future. > > > > > Totally understood and agreed. > > > > > XEN_SCHEDULER_RT_DS doesn't sound too bad. > > > > > It does not, indeed. Let's hear Meng, Sisu, and the other xen-devel > > folks... > > DS9 :-) > > XEN_SCHEDULER_RT_PGEDF ? > > And then we can also have > XEN_SCHEDULER_RT_CBS ? > > I see your concerns about the name here. The strength of the name you proposed, such as XEN_SCHEDULER_RT_PGEDF or XEN_SCHEDULER_RT_DS, is that it clearly states the characteristics of this scheduler. However, if people want to implement a new server mechanism for the EDF scheduling policy, they actually don't have to implement a fresh new one and add new sched_*.c files. They only need to modify the budget_update() and burn_budget() functions based on the new server mechanism. (Actually, we can make this scheduler more generic to make it easier to add a new server mechanism just as the schedule.c does.) In addition, if people want to implement a new real-time scheduling policy, like Rate Monotonic scheduling, they only need to modify a few lines in sched_rt.c. (We actually had the Rate Monotonic scheduling already, but only extract the EDF one for Xen right now to avoid causing unnecessary confusion.) So adding new server mechanisms and adding new scheduling policy (i.e., priority schemes) only requires modify several functions in sched_rt.c, instead of creating a fresh new file and scheduler. If we use the too specific name, like XEN_SCHEDULER_RT_DS, we will have to modify the name in the future when we extend the scheduler. Of course, we could also implement a fresh new scheduler, such as XEN_SCHEDULER_RT_CBS, or XEN_SCHEDULER_RT_PS (periodic server), in a brand new file, but it's actually unnecessary to introduce a new file. Of course, I'm not against using the more specific name, such as XEN_SCHEDULER_RT_DS. What I'm concerned is: when we implement a new server mechanism or new scheduling policy inside the current sched_rt.c, the too specific name will become impropriate and we will have to change name again. :-) Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 4233 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-17 10:12 ` Meng Xu @ 2014-07-17 15:12 ` Dario Faggioli 2014-07-18 5:46 ` Meng Xu 2014-07-18 18:40 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 47+ messages in thread From: Dario Faggioli @ 2014-07-17 15:12 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, Andrew Cooper, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 4585 bytes --] On gio, 2014-07-17 at 06:12 -0400, Meng Xu wrote: > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>于2014年7月16日星期三写 > 道: > DS9 :-) > > XEN_SCHEDULER_RT_PGEDF ? > > And then we can also have > XEN_SCHEDULER_RT_CBS ? > > > > I see your concerns about the name here. > The strength of the name you proposed, such as XEN_SCHEDULER_RT_PGEDF > or XEN_SCHEDULER_RT_DS, is that it clearly states the characteristics > of this scheduler. > Agreed, and I'd go fo XEN_SCHEDULER_RT_DS. It means people have to learn at least the basics of what _D_eferrable _S_erver means (which in turns mean we must provide decent documentation! :-P), but if we want to go for being specific, let's go there at full blast :-) (and being specific means mentioning the actual algorithm, while GEDF is sort of a 'basis', that many algorithms can share, and I don't think I got what the 'P' in PGEDF stands for... did you perhaps mean 'PG'='Partitioned & Global'?) > However, if people want to implement a new server mechanism for the > EDF scheduling policy, they actually don't have to implement a fresh > new one and add new sched_*.c files. They only need to modify the > budget_update() and burn_budget() functions based on the new server > mechanism. (Actually, we can make this scheduler more generic to make > it easier to add a new server mechanism just as the schedule.c does.) > > In addition, if people want to implement a new real-time scheduling > policy, like Rate Monotonic scheduling, they only need to modify a few > lines in sched_rt.c. (We actually had the Rate Monotonic scheduling > already, but only extract the EDF one for Xen right now to avoid > causing unnecessary confusion.) > > So adding new server mechanisms and adding new scheduling policy > (i.e., priority schemes) only requires modify several functions in > sched_rt.c, instead of creating a fresh new file and scheduler. If we > use the too specific name, like XEN_SCHEDULER_RT_DS, we will have to > modify the name in the future when we extend the scheduler. Of > course, we could also implement a fresh new scheduler, such as > XEN_SCHEDULER_RT_CBS, or XEN_SCHEDULER_RT_PS (periodic server), in a > brand new file, but it's actually unnecessary to introduce a new > file. > This is all true. However, I think I agree with Andrew and Konrad about the need of being a bit more specific with naming. Probably, I'd distinguish the name of the source file and the name of the scheduling policy(ies). In fact, as you say, sched_rt.cc is probably fine, as one may just implement new/different things hacking its content, without the need of adding a new one. At that point, you can well introduce, say, XEN_SCHEDULER_RT_CBS, which, as far as the implementation is concerned, only modifies a function or two in sched_rt.c, but for the user perspective, it's a different scheduling policy, and it makes sense for it to have a different name. Both (or more, if we like) scheduling policies may well even coexist, share most of the code in sched_rt.c, and yet being two different scheduling policies... > Of course, I'm not against using the more specific name, such as > XEN_SCHEDULER_RT_DS. What I'm concerned is: when we implement a new > server mechanism or new scheduling policy inside the current > sched_rt.c, the too specific name will become impropriate and we will > have to change name again. :-) > As I said, it's always going to be an RT related scheduling policy so, if the algorithm will be similar enough, it could be a first class citizen inside sched_rt.c. If we want to have more than a scheduling policy, we *need* names (of the policies, not of the source file) to be specific. The only downside of the specific name is if we at some point will want to _replace_, rather than add, the DS algorithm with some other thing. But even in that case, I don't think there will be much issues: at the Xen level (not much compatibility requirements) we can just kill the old and introduce the new. At the libxl level, we'll handle API stability in the usual ways. So, yes, I think I'm all for XEN_SCHEDULER_RT_DS implemented in xen/common/sched_rt.c. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-17 15:12 ` Dario Faggioli @ 2014-07-18 5:46 ` Meng Xu 0 siblings, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-18 5:46 UTC (permalink / raw) To: Dario Faggioli Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, Andrew Cooper, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 1317 bytes --] Hi Dario, > > Of course, I'm not against using the more specific name, such as > > XEN_SCHEDULER_RT_DS. What I'm concerned is: when we implement a new > > server mechanism or new scheduling policy inside the current > > sched_rt.c, the too specific name will become impropriate and we will > > have to change name again. :-) > > > As I said, it's always going to be an RT related scheduling policy so, > if the algorithm will be similar enough, it could be a first class > citizen inside sched_rt.c. > > If we want to have more than a scheduling policy, we *need* names (of > the policies, not of the source file) to be specific. > > The only downside of the specific name is if we at some point will want > to _replace_, rather than add, the DS algorithm with some other thing. > But even in that case, I don't think there will be much issues: at the > Xen level (not much compatibility requirements) we can just kill the old > and introduce the new. At the libxl level, we'll handle API stability in > the usual ways. > > So, yes, I think I'm all for XEN_SCHEDULER_RT_DS implemented in > xen/common/sched_rt.c. > > OK. I got it. I changed the name from XEN_SCHEDULER_RT to XEN_SCHEDULER_RT_DS. Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 2365 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor 2014-07-17 10:12 ` Meng Xu 2014-07-17 15:12 ` Dario Faggioli @ 2014-07-18 18:40 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 47+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-07-18 18:40 UTC (permalink / raw) To: Meng Xu, josh.whitehead, robert.vanvossen, Paul.Skentzos, Steve.VanderLeest Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, Andrew Cooper, Dario Faggioli, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu On Thu, Jul 17, 2014 at 06:12:22AM -0400, Meng Xu wrote: > Hi Dario and Konrad, > > First, thank you very much for your detailed and very useful comments on > this patch! I'm modifying it based on your comments now. (I'm sorry I > didn't reply promptly because I was in travel these days and cannot always > get access to network. :-() Hey, CCing some of the DornerWorks folks. > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>于2014年7月16日星期三写道: > > > On Fri, Jul 11, 2014 at 05:48:17PM +0200, Dario Faggioli wrote: > > > On ven, 2014-07-11 at 16:40 +0100, Andrew Cooper wrote: > > > > On 11/07/14 16:21, Dario Faggioli wrote: > > > > > > > > So, ideas? RTDS (as per Real-Time Deferrable-Server)? RRESDS > > (Resource > > > > > Reservation Deferrable Server)? Just DS (Deferrable Server)? > > > > > > > > > > I'm not sure I like much any of these... > > > > > > > > > > What was it that you had in mind when you said "there are several > > > > > classes of realtime schedulers employing different semantics and > > > > > parameters" ? > > > > > > > Hmm - naming things is difficult. How about bob? > > > > > > > Well, if we go that route, I prefer Alice! :-D :-D > > > > > > > My concern is that naming it "rt" seems too generic, and will cause > > > > confusion of a) what type of realtime scheduler it is, and b) further > > > > problems if someone wants to come and implement a different realtime > > > > scheduler in the future. > > > > > > > Totally understood and agreed. > > > > > > > XEN_SCHEDULER_RT_DS doesn't sound too bad. > > > > > > > It does not, indeed. Let's hear Meng, Sisu, and the other xen-devel > > > folks... > > > > DS9 :-) > > > > XEN_SCHEDULER_RT_PGEDF ? > > > > And then we can also have > > XEN_SCHEDULER_RT_CBS ? > > > > > I see your concerns about the name here. > The strength of the name you proposed, such as XEN_SCHEDULER_RT_PGEDF or > XEN_SCHEDULER_RT_DS, is that it clearly states the characteristics of this > scheduler. > > However, if people want to implement a new server mechanism for the EDF > scheduling policy, they actually don't have to implement a fresh new one > and add new sched_*.c files. They only need to modify the budget_update() > and burn_budget() functions based on the new server mechanism. (Actually, > we can make this scheduler more generic to make it easier to add a new > server mechanism just as the schedule.c does.) > > In addition, if people want to implement a new real-time scheduling policy, > like Rate Monotonic scheduling, they only need to modify a few lines in > sched_rt.c. (We actually had the Rate Monotonic scheduling already, but > only extract the EDF one for Xen right now to avoid causing unnecessary > confusion.) > > So adding new server mechanisms and adding new scheduling policy (i.e., > priority schemes) only requires modify several functions in sched_rt.c, > instead of creating a fresh new file and scheduler. If we use the too > specific name, like XEN_SCHEDULER_RT_DS, we will have to modify the name in > the future when we extend the scheduler. Of course, we could also > implement a fresh new scheduler, such as XEN_SCHEDULER_RT_CBS, or > XEN_SCHEDULER_RT_PS (periodic server), in a brand new file, but it's > actually unnecessary to introduce a new file. Joshua, Robert, et. al, Would it be feasible to rebase the core changes on top of this file instead of using the SEDF? > > Of course, I'm not against using the more specific name, such as > XEN_SCHEDULER_RT_DS. What I'm concerned is: when we implement a new server > mechanism or new scheduling policy inside the current sched_rt.c, the too > specific name will become impropriate and we will have to change name > again. :-) Right. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu 2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu @ 2014-07-11 4:49 ` Meng Xu 2014-07-11 11:02 ` Wei Liu 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu ` (3 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Meng Xu @ 2014-07-11 4:49 UTC (permalink / raw) To: xen-devel Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xumengpanda, Meng Xu, lichong659, dgolomb Add xl command for rt scheduler Signed-off-by: Sisu Xi <xisisu@gmail.com> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> --- docs/man/xl.pod.1 | 40 +++++++++++++ tools/libxl/xl.h | 1 + tools/libxl/xl_cmdimpl.c | 141 +++++++++++++++++++++++++++++++++++++++++++++ tools/libxl/xl_cmdtable.c | 9 +++ 4 files changed, 191 insertions(+) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 30bd4bf..42aeedc 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1019,6 +1019,46 @@ Restrict output to domains in the specified cpupool. =back +=item B<sched-rt> [I<OPTIONS>] + +Set or get rt (Real Time) scheduler parameters. This rt scheduler applies +Preemptive Global Earliest Deadline First real-time scheduling algorithm to +schedule VCPUs in the system. Each VCPU has a dedicated period and budget. +While scheduled, a VCPU burns its budget. +A VCPU has its budget replenished at the beginning of each of its periods; +The VCPU discards its unused budget at the end of its periods. + +B<OPTIONS> + +=over 4 + +=item B<-d DOMAIN>, B<--domain=DOMAIN> + +Specify domain for which scheduler parameters are to be modified or retrieved. +Mandatory for modifying scheduler parameters. + +=item B<-v VCPU>, B<--vcpu=VCPU> + +Specify the index of VCPU whose parameters will be set. +A domain can have multiple VCPUs; Each VCPU has a unique index in this domain; +When set domain's parameters, it needs to set each VCPU's parameters of this +domain. + +=item B<-p PERIOD>, B<--period=PERIOD> + +A VCPU replenish its budget in every period. Time unit is millisecond. + +=item B<-b BUDGET>, B<--budget=BUDGET> + +A VCPU has BUDGET amount of time to run for each period. +Time unit is millisecond. + +=item B<-c CPUPOOL>, B<--cpupool=CPUPOOL> + +Restrict output to domains in the specified cpupool. + +=back + =back =head1 CPUPOOLS COMMANDS diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h index 10a2e66..51b634a 100644 --- a/tools/libxl/xl.h +++ b/tools/libxl/xl.h @@ -67,6 +67,7 @@ int main_memset(int argc, char **argv); int main_sched_credit(int argc, char **argv); int main_sched_credit2(int argc, char **argv); int main_sched_sedf(int argc, char **argv); +int main_sched_rt(int argc, char **argv); int main_domid(int argc, char **argv); int main_domname(int argc, char **argv); int main_rename(int argc, char **argv); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 68df548..c043f88 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4947,6 +4947,7 @@ int main_sharing(int argc, char **argv) return 0; } + static int sched_domain_get(libxl_scheduler sched, int domid, libxl_domain_sched_params *scinfo) { @@ -5098,6 +5099,52 @@ static int sched_sedf_domain_output( return 0; } + +static int sched_rt_domain_output( + int domid) +{ + char *domname; + libxl_domain_sched_params scinfo; + int rc, i; + + if (domid < 0) { + printf("%-33s %4s %4s %6s %6s\n", "Name", "ID", "VCPU", "Period", "Budget"); + return 0; + } + + libxl_domain_sched_params_init(&scinfo); + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo); + if (rc) + return rc; + + domname = libxl_domid_to_name(ctx, domid); + for( i = 0; i < scinfo.rt.num_vcpus; i++ ) + { + printf("%-33s %4d %4d %6d %6d\n", + domname, + domid, + i, + scinfo.rt.vcpus[i].period, + scinfo.rt.vcpus[i].budget); + } + free(domname); + + libxl_domain_sched_params_dispose(&scinfo); + + return 0; +} + +static int sched_rt_pool_output(uint32_t poolid) +{ + char *poolname; + + poolname = libxl_cpupoolid_to_name(ctx, poolid); + printf("Cpupool %s: sched=EDF\n", poolname); + + free(poolname); + return 0; +} + static int sched_default_pool_output(uint32_t poolid) { char *poolname; @@ -5465,6 +5512,100 @@ int main_sched_sedf(int argc, char **argv) return 0; } +/* + * <nothing> : List all domain paramters and sched params + * -d [domid] : List domain params for domain + * -d [domid] [params] : Set domain params for domain + */ +int main_sched_rt(int argc, char **argv) +{ + const char *dom = NULL; + const char *cpupool = NULL; + int period = 10, opt_p = 0; + int budget = 4, opt_b = 0; + int vcpu_index = 0, opt_v = 0; + int opt, rc; + static struct option opts[] = { + {"domain", 1, 0, 'd'}, + {"period", 1, 0, 'p'}, + {"budget", 1, 0, 'b'}, + {"vcpu", 1, 0, 'v'}, + {"cpupool", 1, 0, 'c'}, + COMMON_LONG_OPTS, + {0, 0, 0, 0} + }; + + SWITCH_FOREACH_OPT(opt, "d:p:b:v:c:h", opts, "sched-rt", 0) { + case 'd': + dom = optarg; + break; + case 'p': + period = strtol(optarg, NULL, 10); + opt_p = 1; + break; + case 'b': + budget = strtol(optarg, NULL, 10); + opt_b = 1; + break; + case 'v': + vcpu_index = strtol(optarg, NULL, 10); + opt_v = 1; + break; + case 'c': + cpupool = optarg; + break; + } + + if (cpupool && (dom || opt_p || opt_b || opt_v)) { + fprintf(stderr, "Specifying a cpupool is not allowed with other options.\n"); + return 1; + } + if (!dom && (opt_p || opt_b || opt_v)) { + fprintf(stderr, "Must specify a domain.\n"); + return 1; + } + if ( (opt_v || opt_p || opt_b) && (opt_p + opt_b + opt_v != 3) ) { + fprintf(stderr, "Must specify vcpu, period, budget\n"); + return 1; + } + + if (!dom) { /* list all domain's rt scheduler info */ + return -sched_domain_output(LIBXL_SCHEDULER_RT, + sched_rt_domain_output, + sched_rt_pool_output, + cpupool); + } else { + uint32_t domid = find_domain(dom); + if (!opt_p && !opt_b && !opt_v) { /* output rt scheduler info */ + sched_rt_domain_output(-1); + return -sched_rt_domain_output(domid); + } else { /* set rt scheduler paramaters */ + libxl_domain_sched_params scinfo; + libxl_domain_sched_params_init(&scinfo); + scinfo.rt.max_vcpus = LIBXL_XEN_LEGACY_MAX_VCPUS; + /* TODO: only alloc an array with the same num of dom's vcpus*/ + scinfo.rt.num_vcpus = LIBXL_XEN_LEGACY_MAX_VCPUS; + scinfo.rt.vcpus = + (libxl_vcpu*) malloc( sizeof(libxl_vcpu) * scinfo.rt.max_vcpus ); + if ( scinfo.rt.vcpus == NULL ) { + fprintf(stderr, "Alloc memory for scinfo.rt.vcpus fails\n"); + return 1; + } + scinfo.sched = LIBXL_SCHEDULER_RT; + scinfo.rt.vcpu_index = vcpu_index; + scinfo.rt.vcpus[vcpu_index].period = period; + scinfo.rt.vcpus[vcpu_index].budget = budget; + + rc = sched_domain_set(domid, &scinfo); + libxl_domain_sched_params_dispose(&scinfo); + if (rc) + return -rc; + } + } + + return 0; +} + int main_domid(int argc, char **argv) { uint32_t domid; diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c index 4279b9f..70e2585 100644 --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -277,6 +277,15 @@ struct cmd_spec cmd_table[] = { " --period/--slice)\n" "-c CPUPOOL, --cpupool=CPUPOOL Restrict output to CPUPOOL" }, + { "sched-rt", + &main_sched_rt, 0, 1, + "Get/set rt scheduler parameters", + "[-d <Domain> [-v[=VCPU]] [-p[=PERIOD]] [-b[=BUDGET]]]", + "-d DOMAIN, --domain=DOMAIN Domain to modify\n" + "-v VCPU, --vcpu=VCPU VCPU\n" + "-p PERIOD, --period=PERIOD Period (ms)\n" + "-b BUDGET, --budget=BUDGET Budget (ms)\n" + }, { "domid", &main_domid, 0, 0, "Convert a domain name to domain id", -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 4:49 ` [PATCH RFC v1 2/4] xl for rt scheduler Meng Xu @ 2014-07-11 11:02 ` Wei Liu 2014-07-11 14:59 ` Meng Xu 0 siblings, 1 reply; 47+ messages in thread From: Wei Liu @ 2014-07-11 11:02 UTC (permalink / raw) To: Meng Xu Cc: wei.liu2, ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb On Fri, Jul 11, 2014 at 12:49:56AM -0400, Meng Xu wrote: [...] > > =head1 CPUPOOLS COMMANDS > diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h > index 10a2e66..51b634a 100644 > --- a/tools/libxl/xl.h > +++ b/tools/libxl/xl.h > @@ -67,6 +67,7 @@ int main_memset(int argc, char **argv); > int main_sched_credit(int argc, char **argv); > int main_sched_credit2(int argc, char **argv); > int main_sched_sedf(int argc, char **argv); > +int main_sched_rt(int argc, char **argv); > int main_domid(int argc, char **argv); > int main_domname(int argc, char **argv); > int main_rename(int argc, char **argv); > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 68df548..c043f88 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -4947,6 +4947,7 @@ int main_sharing(int argc, char **argv) > return 0; > } > > + Stray blank line. > static int sched_domain_get(libxl_scheduler sched, int domid, > libxl_domain_sched_params *scinfo) > { > @@ -5098,6 +5099,52 @@ static int sched_sedf_domain_output( > return 0; > } > > + > +static int sched_rt_domain_output( > + int domid) > +{ > + char *domname; > + libxl_domain_sched_params scinfo; > + int rc, i; > + > + if (domid < 0) { > + printf("%-33s %4s %4s %6s %6s\n", "Name", "ID", "VCPU", "Period", "Budget"); > + return 0; > + } > + Just print the header and nothing more? > + libxl_domain_sched_params_init(&scinfo); > + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo); > + if (rc) > + return rc; > + This is violating libxl type paradigm. See libxl.h. 263 * libxl types 264 * 265 * Most libxl types are defined by the libxl IDL (see 266 * libxl_types.idl). The library provides a common set of methods for 267 * initialising and freeing these types. 268 * 269 * IDL-generated libxl types should be used as follows: the user must 270 * always call the "init" function before using a type, even if the 271 * variable is simply being passed by reference as an out parameter 272 * to a libxl function. The user must always calls "dispose" exactly 273 * once afterwards, to clean up, regardless of whether operations on 274 * this object succeeded or failed. See the xl code for examples. And you should check other places that use libxl types as well. Wei. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 11:02 ` Wei Liu @ 2014-07-11 14:59 ` Meng Xu 2014-07-11 15:07 ` Dario Faggioli 0 siblings, 1 reply; 47+ messages in thread From: Meng Xu @ 2014-07-11 14:59 UTC (permalink / raw) To: Wei Liu Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 3844 bytes --] Hi Wei, First, thank you very much for your comments! > > =head1 CPUPOOLS COMMANDS > > diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h > > index 10a2e66..51b634a 100644 > > --- a/tools/libxl/xl.h > > +++ b/tools/libxl/xl.h > > @@ -67,6 +67,7 @@ int main_memset(int argc, char **argv); > > int main_sched_credit(int argc, char **argv); > > int main_sched_credit2(int argc, char **argv); > > int main_sched_sedf(int argc, char **argv); > > +int main_sched_rt(int argc, char **argv); > > int main_domid(int argc, char **argv); > > int main_domname(int argc, char **argv); > > int main_rename(int argc, char **argv); > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > > index 68df548..c043f88 100644 > > --- a/tools/libxl/xl_cmdimpl.c > > +++ b/tools/libxl/xl_cmdimpl.c > > @@ -4947,6 +4947,7 @@ int main_sharing(int argc, char **argv) > > return 0; > > } > > > > + > > Stray blank line. > I will delete the blank line. > > > static int sched_domain_get(libxl_scheduler sched, int domid, > > libxl_domain_sched_params *scinfo) > > { > > @@ -5098,6 +5099,52 @@ static int sched_sedf_domain_output( > > return 0; > > } > > > > + > > +static int sched_rt_domain_output( > > + int domid) > > +{ > > + char *domname; > > + libxl_domain_sched_params scinfo; > > + int rc, i; > > + > > + if (domid < 0) { > > + printf("%-33s %4s %4s %6s %6s\n", "Name", "ID", "VCPU", > "Period", "Budget"); > > + return 0; > > + } > > + > > Just print the header and nothing more? > We just print out the header here. This function will call sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo) and print out the VCPUs parameters of each domain in the calling function. > > > + libxl_domain_sched_params_init(&scinfo); > > + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, &scinfo); > > + if (rc) > > + return rc; > > + > > This is violating libxl type paradigm. See libxl.h. > 263 * libxl types > 264 * > 265 * Most libxl types are defined by the libxl IDL (see > 266 * libxl_types.idl). The library provides a common set of methods for > 267 * initialising and freeing these types. > 268 * > 269 * IDL-generated libxl types should be used as follows: the user must > 270 * always call the "init" function before using a type, even if the > 271 * variable is simply being passed by reference as an out parameter > 272 * to a libxl function. The user must always calls "dispose" exactly > 273 * once afterwards, to clean up, regardless of whether operations on > 274 * this object succeeded or failed. See the xl code for examples. > > And you should check other places that use libxl types as well. > Thank you very much for pasting the rules here! I really appreciate it. However, I didn't quite get why it violate the libxl type paradigm and how I should correct it. (Sorry. :-() We actually followed the way credit scheduler does in main_sched_credit(int argc, char **argv) } else { /* set credit scheduler paramaters */ libxl_domain_sched_params scinfo; libxl_domain_sched_params_init(&scinfo); scinfo.sched = LIBXL_SCHEDULER_CREDIT; if (opt_w) scinfo.weight = weight; if (opt_c) scinfo.cap = cap; rc = sched_domain_set(domid, &scinfo); libxl_domain_sched_params_dispose(&scinfo); Could you help let me know how I should modify it? I will check the rest to modify when I know how to do it. Thank you very much! Best, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 6705 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 14:59 ` Meng Xu @ 2014-07-11 15:07 ` Dario Faggioli 2014-07-11 16:25 ` Meng Xu 2014-07-13 12:58 ` Meng Xu 0 siblings, 2 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 15:07 UTC (permalink / raw) To: Meng Xu Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 1829 bytes --] On ven, 2014-07-11 at 10:59 -0400, Meng Xu wrote: > > + libxl_domain_sched_params_init(&scinfo); > > + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, > &scinfo); > > + if (rc) > > + return rc; > > + > Thank you very much for pasting the rules here! I really appreciate > it. However, I didn't quite get why it violate the libxl type paradigm > and how I should correct it. (Sorry. :-() > <<The user must always calls "dispose" exactly once afterwards, to clean up, regardless of whether operations on this object succeeded or failed>> While, above, you're exiting, if rc is true, without calling dispose. It depens a lot on the function, but what you usually do, is grouping the calls to the various dispose under a label (typically 'out:') and goto there to exit. Look around, both in xl and libxl, you'll find plenty of examples of that. > We actually followed the way credit scheduler does > in main_sched_credit(int argc, char **argv) > > > } else { /* set credit scheduler paramaters */ > libxl_domain_sched_params scinfo; > libxl_domain_sched_params_init(&scinfo); > scinfo.sched = LIBXL_SCHEDULER_CREDIT; > if (opt_w) > scinfo.weight = weight; > if (opt_c) > scinfo.cap = cap; > rc = sched_domain_set(domid, &scinfo); > libxl_domain_sched_params_dispose(&scinfo); > And in fact, here's dispose! :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 15:07 ` Dario Faggioli @ 2014-07-11 16:25 ` Meng Xu 2014-07-13 12:58 ` Meng Xu 1 sibling, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-11 16:25 UTC (permalink / raw) To: Dario Faggioli Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 1271 bytes --] Hi Dario, 2014-07-11 11:07 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>: > On ven, 2014-07-11 at 10:59 -0400, Meng Xu wrote: > > > > > + libxl_domain_sched_params_init(&scinfo); > > > + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, > > &scinfo); > > > + if (rc) > > > + return rc; > > > + > > > Thank you very much for pasting the rules here! I really appreciate > > it. However, I didn't quite get why it violate the libxl type paradigm > > and how I should correct it. (Sorry. :-() > > > <<The user must always calls "dispose" exactly once afterwards, to clean > up, regardless of whether operations on this object succeeded or > failed>> > > While, above, you're exiting, if rc is true, without calling dispose. > > It depens a lot on the function, but what you usually do, is grouping > the calls to the various dispose under a label (typically 'out:') and > goto there to exit. > > Look around, both in xl and libxl, you'll find plenty of examples of > that. > Now I got it. Thank you very much! I will modify it. :-) Best regards, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 2175 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-11 15:07 ` Dario Faggioli 2014-07-11 16:25 ` Meng Xu @ 2014-07-13 12:58 ` Meng Xu 2014-07-14 7:40 ` Dario Faggioli ` (2 more replies) 1 sibling, 3 replies; 47+ messages in thread From: Meng Xu @ 2014-07-13 12:58 UTC (permalink / raw) To: Dario Faggioli Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 2552 bytes --] Hi Dario, > > > > > + libxl_domain_sched_params_init(&scinfo); > > > + rc = sched_domain_get(LIBXL_SCHEDULER_RT, domid, > > &scinfo); > > > + if (rc) > > > + return rc; > > > + > > > Thank you very much for pasting the rules here! I really appreciate > > it. However, I didn't quite get why it violate the libxl type paradigm > > and how I should correct it. (Sorry. :-() > > > <<The user must always calls "dispose" exactly once afterwards, to clean > up, regardless of whether operations on this object succeeded or > failed>> > > While, above, you're exiting, if rc is true, without calling dispose. > > It depens a lot on the function, but what you usually do, is grouping > the calls to the various dispose under a label (typically 'out:') and > goto there to exit. > > Look around, both in xl and libxl, you'll find plenty of examples of > that. > > > We actually followed the way credit scheduler does > > in main_sched_credit(int argc, char **argv) > > > > > > } else { /* set credit scheduler paramaters */ > > libxl_domain_sched_params scinfo; > > libxl_domain_sched_params_init(&scinfo); > > scinfo.sched = LIBXL_SCHEDULER_CREDIT; > > if (opt_w) > > scinfo.weight = weight; > > if (opt_c) > > scinfo.cap = cap; > > rc = sched_domain_set(domid, &scinfo); > > libxl_domain_sched_params_dispose(&scinfo); > > > And in fact, here's dispose! :-) > > I think you are right! But I think the implementation of this functionality for credit scheduler also has the exactly same issue: scinfo will not be disposed when hypercall returns false. Here is the code for sched_credit_domain_output(int domid) in tools/libxl/xl_cmdimpl.c. rc = sched_domain_get(LIBXL_SCHEDULER_CREDIT, domid, &scinfo); if (rc) return rc; domname = libxl_domid_to_name(ctx, domid); printf("%-33s %4d %6d %4d\n", domname, domid, scinfo.weight, scinfo.cap); free(domname); libxl_domain_sched_params_dispose(&scinfo); (Note: sched_domain_get() init the scinfo, but didn't dispose it. ) As you can see, it has the exact issue I had. :-) If it's the case, I can submit a separate patch for this. :-) Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 4247 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-13 12:58 ` Meng Xu @ 2014-07-14 7:40 ` Dario Faggioli 2014-07-14 9:31 ` Wei Liu 2014-07-17 15:39 ` Ian Campbell 2 siblings, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-14 7:40 UTC (permalink / raw) To: Meng Xu Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 900 bytes --] On dom, 2014-07-13 at 08:58 -0400, Meng Xu wrote: > I think you are right! But I think the implementation of this > functionality for credit scheduler also has the exactly same issue: > scinfo will not be disposed when hypercall returns false. > (Note: sched_domain_get() init the scinfo, but didn't dispose it. ) > ISWYM. Yes, I think either in sched_domain_get(), or at the three (four considering your code) callsites, a _dispose() would be good. > As you can see, it has the exact issue I had. :-) If it's the case, I > can submit a separate patch for this. :-) > Go ahead. :-) Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-13 12:58 ` Meng Xu 2014-07-14 7:40 ` Dario Faggioli @ 2014-07-14 9:31 ` Wei Liu 2014-07-17 15:39 ` Ian Campbell 2 siblings, 0 replies; 47+ messages in thread From: Wei Liu @ 2014-07-14 9:31 UTC (permalink / raw) To: Meng Xu Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb On Sun, Jul 13, 2014 at 08:58:01AM -0400, Meng Xu wrote: [...] > > > > > I think you are right! But I think the implementation of this functionality > for credit scheduler also has the exactly same issue: scinfo will not be > disposed when hypercall returns false. > The paradigm was documented not long ago, so I wouldn't be surprised if there's existing code that violates that paradigm. In any case we need to make sure new code follows our rule. :-) > Here is the code for sched_credit_domain_output(int domid) in > tools/libxl/xl_cmdimpl.c. > rc = sched_domain_get(LIBXL_SCHEDULER_CREDIT, domid, &scinfo); > > if (rc) > > return rc; > > domname = libxl_domid_to_name(ctx, domid); > > printf("%-33s %4d %6d %4d\n", > > domname, > > domid, > > scinfo.weight, > > scinfo.cap); > > free(domname); > > libxl_domain_sched_params_dispose(&scinfo); > > (Note: sched_domain_get() init the scinfo, but didn't dispose it. ) > > As you can see, it has the exact issue I had. :-) If it's the case, I can > submit a separate patch for this. :-) > That's a good idea. Wei. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 2/4] xl for rt scheduler 2014-07-13 12:58 ` Meng Xu 2014-07-14 7:40 ` Dario Faggioli 2014-07-14 9:31 ` Wei Liu @ 2014-07-17 15:39 ` Ian Campbell 2 siblings, 0 replies; 47+ messages in thread From: Ian Campbell @ 2014-07-17 15:39 UTC (permalink / raw) To: Meng Xu Cc: Wei Liu, Sisu Xi, Stefano Stabellini, George Dunlap, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb On Sun, 2014-07-13 at 08:58 -0400, Meng Xu wrote: > Look around, both in xl and libxl, you'll find plenty of > examples of > that. > > > We actually followed the way credit scheduler does > > in main_sched_credit(int argc, char **argv) > > > > > > } else { /* set credit scheduler paramaters */ > > libxl_domain_sched_params scinfo; > > libxl_domain_sched_params_init(&scinfo); > > scinfo.sched = LIBXL_SCHEDULER_CREDIT; > > if (opt_w) > > scinfo.weight = weight; > > if (opt_c) > > scinfo.cap = cap; > > rc = sched_domain_set(domid, &scinfo); > > libxl_domain_sched_params_dispose(&scinfo); > > > And in fact, here's dispose! :-) > > > > I think you are right! But I think the implementation of this > functionality for credit scheduler also has the exactly same issue: > scinfo will not be disposed when hypercall returns false. Yes, there's quite a lot of code in libxl and xl which is not correct in this area but which happens to be "OK" because the struct in question has no dynamically allocated members. So this is a latent bug since if we were to add a dynamic member to the struct all this code would become immediately buggy. We should try to stamp these out as we notice them (as we have done here). Ian. ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu 2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu 2014-07-11 4:49 ` [PATCH RFC v1 2/4] xl for rt scheduler Meng Xu @ 2014-07-11 4:49 ` Meng Xu 2014-07-11 11:05 ` Wei Liu ` (2 more replies) 2014-07-11 4:49 ` [PATCH RFC v1 4/4] libxc " Meng Xu ` (2 subsequent siblings) 5 siblings, 3 replies; 47+ messages in thread From: Meng Xu @ 2014-07-11 4:49 UTC (permalink / raw) To: xen-devel Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xumengpanda, Meng Xu, lichong659, dgolomb Add libxl functions to set/get domain's parameters for rt scheduler Signed-off-by: Sisu Xi <xisisu@gmail.com> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> --- tools/libxl/libxl.c | 121 +++++++++++++++++++++++++++++++++++++++++++ tools/libxl/libxl.h | 7 +++ tools/libxl/libxl_types.idl | 13 +++++ 3 files changed, 141 insertions(+) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 39f1c28..670420e 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5100,6 +5100,121 @@ static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid, return 0; } +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid, + libxl_domain_sched_params *scinfo) +{ + struct xen_domctl_sched_rt_params sdom; + int rc, i; + + rc = xc_sched_rt_domain_get(CTX->xch, domid, &sdom); + if (rc != 0) { + LOGE(ERROR, "getting domain sched rt"); + return ERROR_FAIL; + } + + libxl_domain_sched_params_init(scinfo); + scinfo->sched = LIBXL_SCHEDULER_RT; + scinfo->rt.max_vcpus = LIBXL_XEN_LEGACY_MAX_VCPUS; + /*TODO: alloc array with exact num of dom's vcpus; and use GCNEW_ARRAY()*/ + scinfo->rt.vcpus = (libxl_vcpu *) + malloc( sizeof(libxl_vcpu) * scinfo->rt.max_vcpus ); + if ( !scinfo->rt.vcpus ){ + LOGE(ERROR, "Allocate lib_vcpu array fails\n"); + return ERROR_INVAL; + } + scinfo->rt.num_vcpus = sdom.num_vcpus; + for( i = 0; i < sdom.num_vcpus; ++i) + { + scinfo->rt.vcpus[i].period = sdom.vcpus[i].period; + scinfo->rt.vcpus[i].budget = sdom.vcpus[i].budget; + } + + return 0; +} + +#define SCHED_RT_VCPU_PARAMS_MAX 4294967295 /* 2^32-1 */ + +/* + * Sanity check of the scinfo parameters + * return 0 if all values are valid + * return 1 if one param is default value + * return 2 if the target vcpu's index, period or budget is out of range + */ +static int sched_rt_domain_set_validate_params(libxl__gc *gc, + const libxl_domain_sched_params *scinfo) +{ + int vcpu_index = scinfo->rt.vcpu_index; + + if (scinfo->rt.vcpus == NULL || + vcpu_index == LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT || + scinfo->rt.vcpus[vcpu_index].period == LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT || + scinfo->rt.vcpus[vcpu_index].budget == LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT) + { + return 1; + } + + if (vcpu_index < 0 || vcpu_index > scinfo->rt.num_vcpus) + { + LOG(ERROR, "VCPU index is not set or out of range, " + "valid values are within range from 0 to %d", scinfo->rt.num_vcpus); + return 2; + } + + if (scinfo->rt.vcpus[vcpu_index].period < 1 || + scinfo->rt.vcpus[vcpu_index].period > SCHED_RT_VCPU_PARAMS_MAX) + { + LOG(ERROR, "VCPU period is not set or out of range, " + "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PARAMS_MAX); + return 2; + } + + if (scinfo->rt.vcpus[vcpu_index].budget < 1 || + scinfo->rt.vcpus[vcpu_index].budget > SCHED_RT_VCPU_PARAMS_MAX) + { + LOG(ERROR, "VCPU budget is not set or out of range, " + "valid values are within range from 0 to %lu", SCHED_RT_VCPU_PARAMS_MAX); + return 2; + } + + return 0; + +} + +static int sched_rt_domain_set(libxl__gc *gc, uint32_t domid, + const libxl_domain_sched_params *scinfo) +{ + struct xen_domctl_sched_rt_params sdom; + int vcpu_index; + int rc; + + rc = xc_sched_rt_domain_get(CTX->xch, domid, &sdom); + if (rc != 0) { + LOGE(ERROR, "getting domain sched rt"); + return ERROR_FAIL; + } + + rc = sched_rt_domain_set_validate_params(gc, scinfo); + if (rc == 2) + return ERROR_INVAL; + if (rc == 1) + return 0; + if (rc == 0) + { + vcpu_index = scinfo->rt.vcpu_index; + sdom.vcpu_index = scinfo->rt.vcpu_index; + sdom.vcpus[vcpu_index].period = scinfo->rt.vcpus[vcpu_index].period; + sdom.vcpus[vcpu_index].budget = scinfo->rt.vcpus[vcpu_index].budget; + } + + rc = xc_sched_rt_domain_set(CTX->xch, domid, &sdom); + if ( rc < 0 ) { + LOGE(ERROR, "setting domain sched rt"); + return ERROR_FAIL; + } + + return 0; +} + int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid, const libxl_domain_sched_params *scinfo) { @@ -5123,6 +5238,9 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid, case LIBXL_SCHEDULER_ARINC653: ret=sched_arinc653_domain_set(gc, domid, scinfo); break; + case LIBXL_SCHEDULER_RT: + ret=sched_rt_domain_set(gc, domid, scinfo); + break; default: LOG(ERROR, "Unknown scheduler"); ret=ERROR_INVAL; @@ -5153,6 +5271,9 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid, case LIBXL_SCHEDULER_CREDIT2: ret=sched_credit2_domain_get(gc, domid, scinfo); break; + case LIBXL_SCHEDULER_RT: + ret=sched_rt_domain_get(gc, domid, scinfo); + break; default: LOG(ERROR, "Unknown scheduler"); ret=ERROR_INVAL; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 459557d..42c8ede 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1217,6 +1217,13 @@ int libxl_sched_credit_params_set(libxl_ctx *ctx, uint32_t poolid, #define LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT -1 #define LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT -1 +#define LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT -1 +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_DEFAULT -1 +#define LIBXL_DOMAIN_SCHED_PARAM_NUM_VCPUS_DEFAULT -1 +#define LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT -1 +/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/ +#define LIBXL_XEN_LEGACY_MAX_VCPUS 32 + int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid, libxl_domain_sched_params *params); int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid, diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index de25f42..00e00d8 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -139,6 +139,7 @@ libxl_scheduler = Enumeration("scheduler", [ (5, "credit"), (6, "credit2"), (7, "arinc653"), + (8, "rt"), ]) # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN) @@ -289,6 +290,17 @@ libxl_domain_restore_params = Struct("domain_restore_params", [ ("checkpointed_stream", integer), ]) +libxl_rt_vcpu = Struct("vcpu",[ + ("period", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}), + ("budget", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}), + ]) + +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[ + ("vcpus", Array(libxl_rt_vcpu, "max_vcpus")), + ("num_vcpus", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_NUM_VCPUS_DEFAULT'}), + ("vcpu_index", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_VCPU_INDEX_DEFAULT'}), + ]) + libxl_domain_sched_params = Struct("domain_sched_params",[ ("sched", libxl_scheduler), ("weight", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}), @@ -297,6 +309,7 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ ("slice", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}), ("latency", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}), ("extratime", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}), + ("rt", libxl_domain_sched_rt_params), ]) libxl_domain_build_info = Struct("domain_build_info",[ -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu @ 2014-07-11 11:05 ` Wei Liu 2014-07-11 15:08 ` Dario Faggioli 2014-07-17 15:36 ` Ian Campbell 2 siblings, 0 replies; 47+ messages in thread From: Wei Liu @ 2014-07-11 11:05 UTC (permalink / raw) To: Meng Xu Cc: wei.liu2, ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb On Fri, Jul 11, 2014 at 12:49:57AM -0400, Meng Xu wrote: [...] > > # Consistent with SHUTDOWN_* in sched.h (apart from UNKNOWN) > @@ -289,6 +290,17 @@ libxl_domain_restore_params = Struct("domain_restore_params", [ > ("checkpointed_stream", integer), > ]) > > +libxl_rt_vcpu = Struct("vcpu",[ > + ("period", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}), > + ("budget", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}), > + ]) > + > +libxl_domain_sched_rt_params = Struct("domain_sched_rt_params",[ > + ("vcpus", Array(libxl_rt_vcpu, "max_vcpus")), The array counter should be named "num_vcpus" (and you probably need to rename the other "num_vcpus" to something else). Wei. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu 2014-07-11 11:05 ` Wei Liu @ 2014-07-11 15:08 ` Dario Faggioli 2014-07-12 18:16 ` Meng Xu 2014-07-17 15:34 ` Ian Campbell 2014-07-17 15:36 ` Ian Campbell 2 siblings, 2 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 15:08 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 2218 bytes --] On ven, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > +static int sched_rt_domain_get(libxl__gc *gc, uint32_t domid, > + libxl_domain_sched_params *scinfo) > +{ > + struct xen_domctl_sched_rt_params sdom; > + int rc, i; > + > + rc = xc_sched_rt_domain_get(CTX->xch, domid, &sdom); > + if (rc != 0) { > + LOGE(ERROR, "getting domain sched rt"); > + return ERROR_FAIL; > + } > + > + libxl_domain_sched_params_init(scinfo); > + scinfo->sched = LIBXL_SCHEDULER_RT; > + scinfo->rt.max_vcpus = LIBXL_XEN_LEGACY_MAX_VCPUS; > + /*TODO: alloc array with exact num of dom's vcpus; and use GCNEW_ARRAY()*/ > Indeed. That, or just do an hypercall for each VCPU, at least as a first step. > + scinfo->rt.vcpus = (libxl_vcpu *) > + malloc( sizeof(libxl_vcpu) * scinfo->rt.max_vcpus ); > Yes, you said it yourself, I see: GCNEW_ARRAY() and no malloc() in libxl. > + if ( !scinfo->rt.vcpus ){ > + LOGE(ERROR, "Allocate lib_vcpu array fails\n"); > + return ERROR_INVAL; > + } > + scinfo->rt.num_vcpus = sdom.num_vcpus; > + for( i = 0; i < sdom.num_vcpus; ++i) > + { > + scinfo->rt.vcpus[i].period = sdom.vcpus[i].period; > + scinfo->rt.vcpus[i].budget = sdom.vcpus[i].budget; > + } > + > + return 0; > +} > + > +#define SCHED_RT_VCPU_PARAMS_MAX 4294967295 /* 2^32-1 */ > + I don't like the name. I'd go for something like SCHED_RT_VCPU_PERIOD_MAX and SCHED_RT_VCPU_BUDGET_MAX. You well can define both to the same value, of course. As per the value, yes, even if we turn the interface in usecs, 2^32 usecs is still ~4000 secs, which ought to be enough as a period or budget for everyone! :-) For the rest of this file, I think I'll wait for v2, as the interface is subject to be changed. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 15:08 ` Dario Faggioli @ 2014-07-12 18:16 ` Meng Xu 2014-07-14 10:38 ` Dario Faggioli 2014-07-17 15:34 ` Ian Campbell 1 sibling, 1 reply; 47+ messages in thread From: Meng Xu @ 2014-07-12 18:16 UTC (permalink / raw) To: Dario Faggioli Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 1547 bytes --] Hi Dario, > > + if ( !scinfo->rt.vcpus ){ > > + LOGE(ERROR, "Allocate lib_vcpu array fails\n"); > > + return ERROR_INVAL; > > + } > > + scinfo->rt.num_vcpus = sdom.num_vcpus; > > + for( i = 0; i < sdom.num_vcpus; ++i) > > + { > > + scinfo->rt.vcpus[i].period = sdom.vcpus[i].period; > > + scinfo->rt.vcpus[i].budget = sdom.vcpus[i].budget; > > + } > > + > > + return 0; > > +} > > + > > +#define SCHED_RT_VCPU_PARAMS_MAX 4294967295 /* 2^32-1 */ > > + > I don't like the name. I'd go for something like > SCHED_RT_VCPU_PERIOD_MAX and SCHED_RT_VCPU_BUDGET_MAX. > > I will modify this. > You well can define both to the same value, of course. > > As per the value, yes, even if we turn the interface in usecs, 2^32 > usecs is still ~4000 secs, which ought to be enough as a period or > budget for everyone! :-) > > Actually, I'm not very sure about the range (max value) for the xl sched-rt interface now: I will change the type of period and budget in Xen to s_time_t which is signed long (64bits). (I can also change the type of period and budget in tool to be 64bits) My question is: Do we really need to do this range check and set the max value of the xl sched-rt to (2^64 - 1)/1000 (us)? (64 bits can have 2^61-1 ns, so it's (2^64-1)/1000 us) If we look at the actually range value (2^64-1)/1000 ns, it's equal to 5.8*10^11 years, which is too large to be realistic. Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 2238 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-12 18:16 ` Meng Xu @ 2014-07-14 10:38 ` Dario Faggioli 0 siblings, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-14 10:38 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell@citrix.com, xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 2579 bytes --] On sab, 2014-07-12 at 14:16 -0400, Meng Xu wrote: > > You well can define both to the same value, of course. > > As per the value, yes, even if we turn the interface in usecs, > 2^32 > usecs is still ~4000 secs, which ought to be enough as a > period or > budget for everyone! :-) > > > > Actually, I'm not very sure about the range (max value) for the xl > sched-rt interface now: I will change the type of period and budget in > Xen to s_time_t which is signed long (64bits). (I can also change the > type of period and budget in tool to be 64bits) > In libxc, you probably should transition to uint64_t, yes. In libxl, you probably will have to use 'integer', as sedf does: libxl_domain_sched_params = Struct("domain_sched_params",[ ("sched", libxl_scheduler), ("weight", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT'}), ("cap", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_CAP_DEFAULT'}), ("period", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT'}), ("slice", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT'}), ("latency", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT'}), ("extratime", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT'}), ]) > My question is: > Do we really need to do this range check and set the max value of the > xl sched-rt to (2^64 - 1)/1000 (us)? (64 bits can have 2^61-1 ns, so > it's (2^64-1)/1000 us) > If we look at the actually range value (2^64-1)/1000 ns, it's equal to > 5.8*10^11 years, which is too large to be realistic. > I concur. It's quite hard to decide what is a sane upper bound for period and budget, as user may want to try doing all kind of crazy stuff! :-O For sure, if we want to put an upper bound, it has to come from some reasoning about the scheduler's usage, rather than from limitations coming from the actual type used (which makes sense, actually). If, for instance, you keep 2^32 (no matter whether you use 32 or 64 bits), you end up with, if I've done the math correctly, a bit more than 1 hour period and budget, which I think is fine, do you? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 15:08 ` Dario Faggioli 2014-07-12 18:16 ` Meng Xu @ 2014-07-17 15:34 ` Ian Campbell 1 sibling, 0 replies; 47+ messages in thread From: Ian Campbell @ 2014-07-17 15:34 UTC (permalink / raw) To: Dario Faggioli Cc: xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb On Fri, 2014-07-11 at 17:08 +0200, Dario Faggioli wrote: > > + if ( !scinfo->rt.vcpus ){ > > + LOGE(ERROR, "Allocate lib_vcpu array fails\n"); > > + return ERROR_INVAL; > > + } > > + scinfo->rt.num_vcpus = sdom.num_vcpus; > > + for( i = 0; i < sdom.num_vcpus; ++i) > > + { > > + scinfo->rt.vcpus[i].period = sdom.vcpus[i].period; > > + scinfo->rt.vcpus[i].budget = sdom.vcpus[i].budget; > > + } > > + > > + return 0; > > +} > > + > > +#define SCHED_RT_VCPU_PARAMS_MAX 4294967295 /* 2^32-1 */ > > + > I don't like the name. I'd go for something like > SCHED_RT_VCPU_PERIOD_MAX and SCHED_RT_VCPU_BUDGET_MAX. > > You well can define both to the same value, of course. If you want the largest unsigned which can be represented in 32-bits then please use UINT32_MAX from stdint.h Ian. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu 2014-07-11 11:05 ` Wei Liu 2014-07-11 15:08 ` Dario Faggioli @ 2014-07-17 15:36 ` Ian Campbell 2014-07-18 11:05 ` Meng Xu 2 siblings, 1 reply; 47+ messages in thread From: Ian Campbell @ 2014-07-17 15:36 UTC (permalink / raw) To: Meng Xu Cc: xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb On Fri, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > +/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/ > +#define LIBXL_XEN_LEGACY_MAX_VCPUS 32 As the name suggests this limit is legacy. Modern Xen and modern guests can support considerably more VCPUs than this. You need to base your interface on a dynamically sized/allocated array. Perhaps this needs to be carried down all the way to the domctl interface and into Xen, I haven't looked. Ian. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 3/4] libxl for rt scheduler 2014-07-17 15:36 ` Ian Campbell @ 2014-07-18 11:05 ` Meng Xu 0 siblings, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-18 11:05 UTC (permalink / raw) To: Ian Campbell Cc: xisisu@gmail.com, stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com, dario.faggioli@citrix.com, ian.jackson@eu.citrix.com, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 865 bytes --] Hi Ian, Ian Campbell <Ian.Campbell@citrix.com>于2014年7月17日星期四写道: > On Fri, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > > > +/* Consistent with XEN_LEGACY_MAX_VCPUS xen/arch-x86/xen.h*/ > > +#define LIBXL_XEN_LEGACY_MAX_VCPUS 32 > > As the name suggests this limit is legacy. Modern Xen and modern guests > can support considerably more VCPUs than this. > > You need to base your interface on a dynamically sized/allocated array. > Perhaps this needs to be carried down all the way to the domctl > interface and into Xen, I haven't looked. > Yes. You are right! I'm going to dynamically allocate the array, instead of use the LIBXL_XEN_LEGACY_MAX_VCPUS, in the next version. Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 1204 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu ` (2 preceding siblings ...) 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu @ 2014-07-11 4:49 ` Meng Xu 2014-07-11 14:49 ` Dario Faggioli 2014-07-11 10:50 ` Introduce rt real-time scheduler for Xen Wei Liu 2014-07-11 16:19 ` Dario Faggioli 5 siblings, 1 reply; 47+ messages in thread From: Meng Xu @ 2014-07-11 4:49 UTC (permalink / raw) To: xen-devel Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xumengpanda, Meng Xu, lichong659, dgolomb Add xc_sched_rt_* functions to interact with Xen to set/get domain's parameters for rt scheduler. Signed-off-by: Sisu Xi <xisisu@gmail.com> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> --- tools/libxc/Makefile | 1 + tools/libxc/xc_rt.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++ tools/libxc/xenctrl.h | 7 +++++ 3 files changed, 89 insertions(+) create mode 100644 tools/libxc/xc_rt.c diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index f77677c..7ad6dea 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -20,6 +20,7 @@ CTRL_SRCS-y += xc_sedf.c CTRL_SRCS-y += xc_csched.c CTRL_SRCS-y += xc_csched2.c CTRL_SRCS-y += xc_arinc653.c +CTRL_SRCS-y += xc_rt.c CTRL_SRCS-y += xc_tbuf.c CTRL_SRCS-y += xc_pm.c CTRL_SRCS-y += xc_cpu_hotplug.c diff --git a/tools/libxc/xc_rt.c b/tools/libxc/xc_rt.c new file mode 100644 index 0000000..a3e76a1 --- /dev/null +++ b/tools/libxc/xc_rt.c @@ -0,0 +1,81 @@ +/**************************************************************************** + * + * File: xc_rt.c + * Author: Sisu Xi + * Meng Xu + * + * Description: XC Interface to the rt scheduler + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "xc_private.h" + +int +xc_sched_rt_domain_set( + xc_interface *xch, + uint32_t domid, + struct xen_domctl_sched_rt_params *sdom) +{ + int rc; + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BOUNCE(sdom, + sizeof(*sdom), + XC_HYPERCALL_BUFFER_BOUNCE_IN); + + if ( xc_hypercall_bounce_pre(xch, sdom) ) + return -1; + + domctl.cmd = XEN_DOMCTL_scheduler_op; + domctl.domain = (domid_t) domid; + domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT; + domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo; + set_xen_guest_handle(domctl.u.scheduler_op.u.rt.schedule, sdom); + + rc = do_domctl(xch, &domctl); + + xc_hypercall_bounce_post(xch, sdom); + + return rc; +} + +int +xc_sched_rt_domain_get( + xc_interface *xch, + uint32_t domid, + struct xen_domctl_sched_rt_params *sdom) +{ + int rc; + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BOUNCE(sdom, + sizeof(*sdom), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); + + if ( xc_hypercall_bounce_pre(xch, sdom) ) + return -1; + + domctl.cmd = XEN_DOMCTL_scheduler_op; + domctl.domain = (domid_t) domid; + domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT; + domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_getinfo; + set_xen_guest_handle(domctl.u.scheduler_op.u.rt.schedule, sdom); + + rc = do_domctl(xch, &domctl); + + xc_hypercall_bounce_post(xch, sdom); + + return rc; +} + diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 3578b09..b7b0c04 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -865,6 +865,13 @@ int xc_sched_credit2_domain_get(xc_interface *xch, uint32_t domid, struct xen_domctl_sched_credit2 *sdom); +int xc_sched_rt_domain_set(xc_interface *xch, + uint32_t domid, + struct xen_domctl_sched_rt_params *sdom); +int xc_sched_rt_domain_get(xc_interface *xch, + uint32_t domid, + struct xen_domctl_sched_rt_params *sdom); + int xc_sched_arinc653_schedule_set( xc_interface *xch, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 4:49 ` [PATCH RFC v1 4/4] libxc " Meng Xu @ 2014-07-11 14:49 ` Dario Faggioli 2014-07-11 16:23 ` Meng Xu 2014-07-17 15:29 ` Ian Campbell 0 siblings, 2 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 14:49 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 3550 bytes --] On ven, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > Add xc_sched_rt_* functions to interact with Xen to set/get domain's > --- /dev/null > +++ b/tools/libxc/xc_rt.c > +#include "xc_private.h" > + > +int > +xc_sched_rt_domain_set( > + xc_interface *xch, > + uint32_t domid, > + struct xen_domctl_sched_rt_params *sdom) > So, both here... > +{ > + int rc; > + DECLARE_DOMCTL; > + DECLARE_HYPERCALL_BOUNCE(sdom, > + sizeof(*sdom), > + XC_HYPERCALL_BUFFER_BOUNCE_IN); > + ... and here... I know libxc is not terribly consistent when it comes to coding style. Still, I like stuff to be aligned to the opening brace of the function/macro. Have a look at xc_domain.c, e.g., ... int xc_vcpu_setaffinity(xc_interface *xch, uint32_t domid, int vcpu, xc_cpumap_t cpumap_hard_inout, xc_cpumap_t cpumap_soft_inout, uint32_t flags) { DECLARE_DOMCTL; DECLARE_HYPERCALL_BOUNCE(cpumap_hard_inout, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH); DECLARE_HYPERCALL_BOUNCE(cpumap_soft_inout, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH); ... > + if ( xc_hypercall_bounce_pre(xch, sdom) ) > + return -1; > + > + domctl.cmd = XEN_DOMCTL_scheduler_op; > + domctl.domain = (domid_t) domid; > + domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT; > + domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo; > + set_xen_guest_handle(domctl.u.scheduler_op.u.rt.schedule, sdom); > + > + rc = do_domctl(xch, &domctl); > + > + xc_hypercall_bounce_post(xch, sdom); > + So, the bouncing logic seems fine. Looking at what other schedulers do, there should be no particular need for bouncing anything. xen/include/public/domctl.h: struct xen_domctl_scheduler_op { uint32_t sched_id; /* XEN_SCHEDULER_* */ uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */ union { struct xen_domctl_sched_sedf { uint64_aligned_t period; uint64_aligned_t slice; uint64_aligned_t latency; uint32_t extratime; uint32_t weight; } sedf; struct xen_domctl_sched_credit { uint16_t weight; uint16_t cap; } credit; struct xen_domctl_sched_credit2 { uint16_t weight; } credit2; } u; }; I see you have a TODO item relate to this interface, so I guess I'm leaving this alone. My view here is, since per-VCPU scheduling parameters are important for this scheduler, you either: - provide an interface for getting and setting the parameters of _one_ _VCPU_, and the call them repeatedly, if for instance you need to act on a domain. In this case, no array and no bouncing should be necessary. - provide both the above, and another interface, that one with an array as big as the number of VCPUs of the domain, and use it to provide the hypervisor with the parameters of all the VCPUs, and act appropriately inside Xen. Personally, I'd start with the former, and add the latter later, if we deem it necessary. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 14:49 ` Dario Faggioli @ 2014-07-11 16:23 ` Meng Xu 2014-07-11 16:35 ` Dario Faggioli 2014-07-17 15:29 ` Ian Campbell 1 sibling, 1 reply; 47+ messages in thread From: Meng Xu @ 2014-07-11 16:23 UTC (permalink / raw) To: Dario Faggioli Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 4472 bytes --] Hi Dario, 2014-07-11 10:49 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>: > On ven, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > > Add xc_sched_rt_* functions to interact with Xen to set/get domain's > > > --- /dev/null > > +++ b/tools/libxc/xc_rt.c > > > +#include "xc_private.h" > > + > > +int > > +xc_sched_rt_domain_set( > > + xc_interface *xch, > > + uint32_t domid, > > + struct xen_domctl_sched_rt_params *sdom) > > > So, both here... > > > +{ > > + int rc; > > + DECLARE_DOMCTL; > > + DECLARE_HYPERCALL_BOUNCE(sdom, > > + sizeof(*sdom), > > + XC_HYPERCALL_BUFFER_BOUNCE_IN); > > + > ... and here... I know libxc is not terribly consistent when it comes to > coding style. Still, I like stuff to be aligned to the opening brace of > the function/macro. > > Have a look at xc_domain.c, e.g., > > ... > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > xc_cpumap_t cpumap_hard_inout, > xc_cpumap_t cpumap_soft_inout, > uint32_t flags) > { > DECLARE_DOMCTL; > DECLARE_HYPERCALL_BOUNCE(cpumap_hard_inout, 0, > XC_HYPERCALL_BUFFER_BOUNCE_BOTH); > DECLARE_HYPERCALL_BOUNCE(cpumap_soft_inout, 0, > XC_HYPERCALL_BUFFER_BOUNCE_BOTH); > ... > It's so nice of you to giving me such an example! I will modify them in the next version. > > > + if ( xc_hypercall_bounce_pre(xch, sdom) ) > > + return -1; > > + > > + domctl.cmd = XEN_DOMCTL_scheduler_op; > > + domctl.domain = (domid_t) domid; > > + domctl.u.scheduler_op.sched_id = XEN_SCHEDULER_RT; > > + domctl.u.scheduler_op.cmd = XEN_DOMCTL_SCHEDOP_putinfo; > > + set_xen_guest_handle(domctl.u.scheduler_op.u.rt.schedule, sdom); > > + > > + rc = do_domctl(xch, &domctl); > > + > > + xc_hypercall_bounce_post(xch, sdom); > > + > So, the bouncing logic seems fine. Looking at what other schedulers do, > there should be no particular need for bouncing anything. > > xen/include/public/domctl.h: > > struct xen_domctl_scheduler_op { > uint32_t sched_id; /* XEN_SCHEDULER_* */ > uint32_t cmd; /* XEN_DOMCTL_SCHEDOP_* */ > union { > struct xen_domctl_sched_sedf { > uint64_aligned_t period; > uint64_aligned_t slice; > uint64_aligned_t latency; > uint32_t extratime; > uint32_t weight; > } sedf; > struct xen_domctl_sched_credit { > uint16_t weight; > uint16_t cap; > } credit; > struct xen_domctl_sched_credit2 { > uint16_t weight; > } credit2; > } u; > }; > > I see you have a TODO item relate to this interface, so I guess I'm > leaving this alone. > > My view here is, since per-VCPU scheduling parameters are important for > this scheduler, you either: > - provide an interface for getting and setting the parameters of _one_ > _VCPU_, and the call them repeatedly, if for instance you need to act > on a domain. In this case, no array and no bouncing should be > necessary. > - provide both the above, and another interface, that one with an array > as big as the number of VCPUs of the domain, and use it to provide > the hypervisor with the parameters of all the VCPUs, and act > appropriately inside Xen. > > Personally, I'd start with the former, and add the latter later, if we > deem it necessary. > Actually, we did the former one in the very early version of the rt scheduler. The only concern I have about the former version is that it has to issue the hypercall for many times to get all VCPUs' information of a domain, while the later one only issue the hypercall once with bouncing. My concern is: Will the former one has worse performance than the later one? Thanks, Meng > > Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > > -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 6605 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 16:23 ` Meng Xu @ 2014-07-11 16:35 ` Dario Faggioli 2014-07-11 16:49 ` Andrew Cooper 2014-07-12 19:46 ` Meng Xu 0 siblings, 2 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 16:35 UTC (permalink / raw) To: Meng Xu Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 3028 bytes --] An unrelated thing, can you set your mail client to send text message, instead of HTML ones? On ven, 2014-07-11 at 12:23 -0400, Meng Xu wrote: > My view here is, since per-VCPU scheduling parameters are > important for > this scheduler, you either: > - provide an interface for getting and setting the parameters > of _one_ > _VCPU_, and the call them repeatedly, if for instance you > need to act > on a domain. In this case, no array and no bouncing should > be > necessary. > - provide both the above, and another interface, that one > with an array > as big as the number of VCPUs of the domain, and use it to > provide > the hypervisor with the parameters of all the VCPUs, and > act > appropriately inside Xen. > > Personally, I'd start with the former, and add the latter > later, if we > deem it necessary. > > > Actually, we did the former one in the very early version of the rt > scheduler. The only concern I have about the former version is that it > has to issue the hypercall for many times to get all VCPUs' > information of a domain, while the later one only issue the hypercall > once with bouncing. > True. However, if you only want to get or set one particular vcpu, it's annoying to have to deal with the array, isn't it? So, both solutions have up and down sides, depending on what you actually need. :-) This is why I was suggesting to implement both. Of course, it does not matter much which one you implement first. Se, feel free to implement the array variant properly first, and then we'll see if we also need the single VCPU variant. > My concern is: Will the former one has worse performance than the > later one? > Well, Xen has its tricks, but yes, I think performing e.g., 8 or 16 hypercalls in a loop is worse than issueing one returning an array. It also must be noted that, getting and setting scheduling parameters, should not happen in very critical path, so it probably won't matter much. Anyway, as said above, go for the array first if you like it better. When you do that, consider Andrew's comments to the bottom of patch 1: "As a gut feel, I think that these two ints should be part of the struct xen_domctl_sched_rt below, with a guest handle pointing to an array of { period, budget } structures, which avoids the LEGACY_MAX limit." I don't think you'll need vcpu_index any longer, and I'm not sure whether you need max_vcpus to be part of the interface of this hcall, or if you're better out fetching it from some other call, though. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 16:35 ` Dario Faggioli @ 2014-07-11 16:49 ` Andrew Cooper 2014-07-12 19:46 ` Meng Xu 1 sibling, 0 replies; 47+ messages in thread From: Andrew Cooper @ 2014-07-11 16:49 UTC (permalink / raw) To: Dario Faggioli, Meng Xu Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb On 11/07/14 17:35, Dario Faggioli wrote: > >> My concern is: Will the former one has worse performance than the >> later one? >> > Well, Xen has its tricks, but yes, I think performing e.g., 8 or 16 > hypercalls in a loop is worse than issueing one returning an array. As far as hypercalls go, the path to Xen is a long one. From userspace, a hypercall involves a context switch into the kernel (which is a double context switch through Xen) and a context switch from the kernel into Xen, and back again. The return from guest kernel to guest userspace requires a TLB flush. So, 6 context switches and a TLB flushes per hypercall (assuming a 64bit dom0). Use batches wherever possible, even for 2 items. It will be about twice as fast as not batching. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 16:35 ` Dario Faggioli 2014-07-11 16:49 ` Andrew Cooper @ 2014-07-12 19:46 ` Meng Xu 1 sibling, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-12 19:46 UTC (permalink / raw) To: Dario Faggioli Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 1631 bytes --] Hi Dario, > > > My concern is: Will the former one has worse performance than the > > later one? > > > Well, Xen has its tricks, but yes, I think performing e.g., 8 or 16 > hypercalls in a loop is worse than issueing one returning an array. > > It also must be noted that, getting and setting scheduling parameters, > should not happen in very critical path, so it probably won't matter > much. > > Anyway, as said above, go for the array first if you like it better. > When you do that, consider Andrew's comments to the bottom of patch 1: > > "As a gut feel, I think that these two ints should be part of the struct > xen_domctl_sched_rt below, with a guest handle pointing to an array of { > period, budget } structures, which avoids the LEGACY_MAX limit." > > I don't think you'll need vcpu_index any longer, and I'm not sure > whether you need max_vcpus to be part of the interface of this hcall, or > if you're better out fetching it from some other call, though. > > Yes, I don't need the max_vcpus any more after I improve this interface. I think I will use the array for the get operation, since as Andrew mentioned that batching is better than doing hypercall one by one when it involves more than one hypercalls. As to the set operation, I think I can avoid using array. Anyway, I will do this first and let people decide if we need to make further change. :-) (BTW, I will modify the code base on your other comments in the previous email.) Thanks, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 2533 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-11 14:49 ` Dario Faggioli 2014-07-11 16:23 ` Meng Xu @ 2014-07-17 15:29 ` Ian Campbell 2014-07-17 15:34 ` George Dunlap 1 sibling, 1 reply; 47+ messages in thread From: Ian Campbell @ 2014-07-17 15:29 UTC (permalink / raw) To: Dario Faggioli Cc: xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > So, the bouncing logic seems fine. Looking at what other schedulers do, > there should be no particular need for bouncing anything. Seems like there is some confusing precedent around the use of sysctl vs domctl for sched parameters here. Most schedulers use domctl but arinc uses sysctl. All the existing domctl variants use the "inline parameters" scheme which you quoted, where as arcinc+sysctl uses a pointer to a struct and bouncing. This new interface seems to combine the two and uses a domctl with a pointer. I think it should follow either the existing sysctl or domctl, not mix the two forms. Also looking at the new interface added in the first patch I don't think that array of XEN_LEGACY_MAX_VCPUS elements is going to be sufficient. Ian. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-17 15:29 ` Ian Campbell @ 2014-07-17 15:34 ` George Dunlap 2014-07-17 22:16 ` Meng Xu 2014-07-18 9:47 ` Ian Campbell 0 siblings, 2 replies; 47+ messages in thread From: George Dunlap @ 2014-07-17 15:34 UTC (permalink / raw) To: Ian Campbell Cc: Sisu Xi, Stefano Stabellini, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Meng Xu, lichong659, dgolomb On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > >> So, the bouncing logic seems fine. Looking at what other schedulers do, >> there should be no particular need for bouncing anything. > > Seems like there is some confusing precedent around the use of sysctl vs > domctl for sched parameters here. > > Most schedulers use domctl but arinc uses sysctl. They're controlling different things. domctl is controlling parameters related to *a particular domain* (for instance, weight or cap); sysctl is relating to setting parameters *for the scheduler as a whole* (for example, timeslice). -George ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-17 15:34 ` George Dunlap @ 2014-07-17 22:16 ` Meng Xu 2014-07-18 9:49 ` Dario Faggioli 2014-07-18 9:51 ` Ian Campbell 2014-07-18 9:47 ` Ian Campbell 1 sibling, 2 replies; 47+ messages in thread From: Meng Xu @ 2014-07-17 22:16 UTC (permalink / raw) To: George Dunlap Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 1530 bytes --] Hi Ian and George, George Dunlap <George.Dunlap@eu.citrix.com <javascript:_e(%7B%7D,'cvml','George.Dunlap@eu.citrix.com');> >于2014年7月17日星期四写道: > On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell <Ian.Campbell@citrix.com> > wrote: > > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > > > >> So, the bouncing logic seems fine. Looking at what other schedulers do, > >> there should be no particular need for bouncing anything. > > > > Seems like there is some confusing precedent around the use of sysctl vs > > domctl for sched parameters here. > > > > Most schedulers use domctl but arinc uses sysctl. > > They're controlling different things. > > domctl is controlling parameters related to *a particular domain* (for > instance, weight or cap); sysctl is relating to setting parameters > *for the scheduler as a whole* (for example, timeslice). > > Do we have to use inline parameters in domctl? Right now, we used the domctl to set/get the parameters of each vcpu of a domain. So from what the functionality does, it should be in domctl. If we don't have to use inline parameters, I would prefer to use domctl. :-) The reason why ARINC653 scheduler uses sysctl to set parameters is as George said. It sets the scheduling sequence for the ARINC653 scheduler, instead of setting the parameters of a domain. Thank you very much! Best, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 1986 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-17 22:16 ` Meng Xu @ 2014-07-18 9:49 ` Dario Faggioli 2014-07-18 9:51 ` Ian Campbell 1 sibling, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-18 9:49 UTC (permalink / raw) To: Meng Xu Cc: Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 2618 bytes --] On gio, 2014-07-17 at 18:16 -0400, Meng Xu wrote: > Hi Ian and George, > > George Dunlap <George.Dunlap@eu.citrix.com>于2014年7月17日星期四写道: > On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell > <Ian.Campbell@citrix.com> wrote: > > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > > > >> So, the bouncing logic seems fine. Looking at what other > schedulers do, > >> there should be no particular need for bouncing anything. > > > > Seems like there is some confusing precedent around the use > of sysctl vs > > domctl for sched parameters here. > > > > Most schedulers use domctl but arinc uses sysctl. > > They're controlling different things. > > domctl is controlling parameters related to *a particular > domain* (for > instance, weight or cap); sysctl is relating to setting > parameters > *for the scheduler as a whole* (for example, timeslice). > > > Do we have to use inline parameters in domctl? > I think you should use inline parameters as much as possible. Of course, if you need to transfer the parameters for N vcpus (of the same domain) all at once, you can use an array (i.e., an handle, of course), but only for those parameters. So, if you, for instance, you can have something like this: { integer: num_vcpus array[]: { time_t: budget, time_t: period } } This, in case you want to batch the transfer of all the parameters for all the vcpus in one hcall. BTW, I'm not sure you need num_vcpu to be there, it depends if you have it available already from other hcall performed before, etc. For an hypercall that only retrieve the parameters for _one_ _vcpu_, it should be something like this: { time_t budget time_t period } (forgive the meta-language) You don't have to implement the second hypercall if you think we don't need it for now, it was just an example. > Right now, we used the domctl to set/get the parameters of each vcpu > of a domain. So from what the functionality does, it should be in > domctl. If we don't have to use inline parameters, I would prefer to > use domctl. :-) > You shall use domctl! :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-17 22:16 ` Meng Xu 2014-07-18 9:49 ` Dario Faggioli @ 2014-07-18 9:51 ` Ian Campbell 2014-07-18 12:11 ` Meng Xu 1 sibling, 1 reply; 47+ messages in thread From: Ian Campbell @ 2014-07-18 9:51 UTC (permalink / raw) To: Meng Xu Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu On Thu, 2014-07-17 at 18:16 -0400, Meng Xu wrote: > Hi Ian and George, > > George Dunlap <George.Dunlap@eu.citrix.com>于2014年7月17日星期四写道: > On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell > <Ian.Campbell@citrix.com> wrote: > > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > > > >> So, the bouncing logic seems fine. Looking at what other > schedulers do, > >> there should be no particular need for bouncing anything. > > > > Seems like there is some confusing precedent around the use > of sysctl vs > > domctl for sched parameters here. > > > > Most schedulers use domctl but arinc uses sysctl. > > They're controlling different things. > > domctl is controlling parameters related to *a particular > domain* (for > instance, weight or cap); sysctl is relating to setting > parameters > *for the scheduler as a whole* (for example, timeslice). > > > Do we have to use inline parameters in domctl? > > > Right now, we used the domctl to set/get the parameters of each vcpu > of a domain. So from what the functionality does, it should be in > domctl. If we don't have to use inline parameters, I would prefer to > use domctl. :-) The important thing is that the array of per-VCPU parameters needs to be a GUEST_HANDLE and not a statically sized array with some arbitrary size. AFAICT that means that the xen_domctl_shceduler_op change should be: +/* + * This structure is used to pass to rt scheduler from a + * privileged domain to Xen + */ +struct xen_domctl_sched_rt_vcpus_params { + signed long period; /* s_time_t type */ + signed long budget; +}; +typedef struct xen_domctl_sched_rt... +DEFINE_XEN_GUE.... @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { struct xen_domctl_sched_credit2 { uint16_t weight; } credit2; + struct xen_domctl_sched_rt { + type some_per_domain_parameter; + uint16_t nr_vcpus; + uint16_t vcpu_index; + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_vcpu_params_t) vcpu; + } rt; } u; Currently you don't appear to have any per_domain_parameters, but I have included one as an illustration. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-18 9:51 ` Ian Campbell @ 2014-07-18 12:11 ` Meng Xu 0 siblings, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-18 12:11 UTC (permalink / raw) To: Ian Campbell Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, lichong659@gmail.com, dgolomb@seas.upenn.edu [-- Attachment #1.1: Type: text/plain, Size: 2904 bytes --] Hi Ian, Ian Campbell <Ian.Campbell@citrix.com>于2014年7月18日星期五写道: > On Thu, 2014-07-17 at 18:16 -0400, Meng Xu wrote: > > Hi Ian and George, > > > > George Dunlap <George.Dunlap@eu.citrix.com <javascript:;> > >于2014年7月17日星期四写道: > > On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell > > <Ian.Campbell@citrix.com <javascript:;>> wrote: > > > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > > > > > >> So, the bouncing logic seems fine. Looking at what other > > schedulers do, > > >> there should be no particular need for bouncing anything. > > > > > > Seems like there is some confusing precedent around the use > > of sysctl vs > > > domctl for sched parameters here. > > > > > > Most schedulers use domctl but arinc uses sysctl. > > > > They're controlling different things. > > > > domctl is controlling parameters related to *a particular > > domain* (for > > instance, weight or cap); sysctl is relating to setting > > parameters > > *for the scheduler as a whole* (for example, timeslice). > > > > > > Do we have to use inline parameters in domctl? > > > > > > Right now, we used the domctl to set/get the parameters of each vcpu > > of a domain. So from what the functionality does, it should be in > > domctl. If we don't have to use inline parameters, I would prefer to > > use domctl. :-) > > The important thing is that the array of per-VCPU parameters needs to be > a GUEST_HANDLE and not a statically sized array with some arbitrary > size. > > AFAICT that means that the xen_domctl_shceduler_op change should be: > +/* > + * This structure is used to pass to rt scheduler from a > + * privileged domain to Xen > + */ > +struct xen_domctl_sched_rt_vcpus_params { > + signed long period; /* s_time_t type */ > + signed long budget; > +}; > +typedef struct xen_domctl_sched_rt... > +DEFINE_XEN_GUE.... > > @@ -367,6 +383,9 @@ struct xen_domctl_scheduler_op { > struct xen_domctl_sched_credit2 { > uint16_t weight; > } credit2; > + struct xen_domctl_sched_rt { > + type some_per_domain_parameter; > + uint16_t nr_vcpus; > + uint16_t vcpu_index; > + XEN_GUEST_HANDLE_64(xen_domctl_sched_rt_vcpu_params_t) vcpu; > + } rt; > } u; > > Currently you don't appear to have any per_domain_parameters, but I have > included one as an illustration. > > Thank you so much for your detailed explanation! This is exactly what I'm doing right now. :-) It will be in the next version of the patch. :-) Best, Meng -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania [-- Attachment #1.2: Type: text/html, Size: 3801 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-17 15:34 ` George Dunlap 2014-07-17 22:16 ` Meng Xu @ 2014-07-18 9:47 ` Ian Campbell 2014-07-18 10:00 ` Dario Faggioli 1 sibling, 1 reply; 47+ messages in thread From: Ian Campbell @ 2014-07-18 9:47 UTC (permalink / raw) To: George Dunlap Cc: Sisu Xi, Stefano Stabellini, Dario Faggioli, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Meng Xu, lichong659, dgolomb On Thu, 2014-07-17 at 16:34 +0100, George Dunlap wrote: > On Thu, Jul 17, 2014 at 4:29 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Fri, 2014-07-11 at 16:49 +0200, Dario Faggioli wrote: > > > >> So, the bouncing logic seems fine. Looking at what other schedulers do, > >> there should be no particular need for bouncing anything. > > > > Seems like there is some confusing precedent around the use of sysctl vs > > domctl for sched parameters here. > > > > Most schedulers use domctl but arinc uses sysctl. > > They're controlling different things. > > domctl is controlling parameters related to *a particular domain* (for > instance, weight or cap); sysctl is relating to setting parameters > *for the scheduler as a whole* (for example, timeslice). Ah, that makes sense. It seems that arinc handles some degree of per-dom settings via the sysctl interface though, since xen_sysctl_arinc653_schedule takes an array of things which contain domid+vcpuid+params. Perhaps that's peculiar to the way arinc works though. (Although it seems rather incorrect to me for the array to be fixed size instead of a guest handle...) Lets not repeat that mistake with this new scheduler, if the domctl (or a sysctl for that matter) needs per VCPU information then it should IMHO take the parameters inline in the struct (like the others), anything dynamic like the per-vcpu stuff should then be a GUEST_HANDLE pointer to a dynamically sized array (accessed with copy_from_guest_offset et al). Ian. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH RFC v1 4/4] libxc for rt scheduler 2014-07-18 9:47 ` Ian Campbell @ 2014-07-18 10:00 ` Dario Faggioli 0 siblings, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-18 10:00 UTC (permalink / raw) To: Ian Campbell Cc: Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Meng Xu, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --] On ven, 2014-07-18 at 10:47 +0100, Ian Campbell wrote: > Lets not repeat that mistake with this new scheduler, if the domctl (or > a sysctl for that matter) needs per VCPU information then it should IMHO > take the parameters inline in the struct (like the others), anything > dynamic like the per-vcpu stuff should then be a GUEST_HANDLE pointer to > a dynamically sized array (accessed with copy_from_guest_offset et al). > Indeed. There is one other subtle difference, between the new scheduler Meng is working on, and the existing ones (even leaving ARINC out). In fact, SEDF, Credit and Credit2, only allows per domain parameters, while Meng's scheduler supports per-vcpu parameters, and hence he needs a guest handle to deal with the array of these per-vcpu parameters. BTW, while writing this, I saw your <1405677108.13883.9.camel@kazak.uk.xensource.com>, which goes straight to the point I was trying to make, and with which I totally concur. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Introduce rt real-time scheduler for Xen 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu ` (3 preceding siblings ...) 2014-07-11 4:49 ` [PATCH RFC v1 4/4] libxc " Meng Xu @ 2014-07-11 10:50 ` Wei Liu 2014-07-11 11:06 ` Dario Faggioli 2014-07-11 16:19 ` Dario Faggioli 5 siblings, 1 reply; 47+ messages in thread From: Wei Liu @ 2014-07-11 10:50 UTC (permalink / raw) To: Meng Xu Cc: wei.liu2, ian.campbell, xisisu, stefano.stabellini, george.dunlap, dario.faggioli, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb On Fri, Jul 11, 2014 at 12:49:54AM -0400, Meng Xu wrote: [...] > > [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor > [PATCH RFC v1 2/4] xl for rt scheduler > [PATCH RFC v1 3/4] libxl for rt scheduler > [PATCH RFC v1 4/4] libxc for rt scheduler > I have some general comments on how you arrange these patches. At a glance of the title and code you should do them in the order of 1, 4, 3 and 2. Apparently xl depends on libxl, libxl depends on libxc, and libxc depends on hypervisor. You will break bisection with current ordering. And we normally write titles like xen: add rt scheduler libxl: introduce rt scheduler xl: XXXX etc. start with component name and separate with colon. Last but not least, you need to CC relevant maintainers. You can find out maintainers with scripts/get_maintainers.pl. Wei. > ----------- > Meng Xu > PhD Student in Computer and Information Science > University of Pennsylvania > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Introduce rt real-time scheduler for Xen 2014-07-11 10:50 ` Introduce rt real-time scheduler for Xen Wei Liu @ 2014-07-11 11:06 ` Dario Faggioli 2014-07-11 16:14 ` Meng Xu 0 siblings, 1 reply; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 11:06 UTC (permalink / raw) To: Wei Liu Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, Meng Xu, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 1447 bytes --] On ven, 2014-07-11 at 11:50 +0100, Wei Liu wrote: > On Fri, Jul 11, 2014 at 12:49:54AM -0400, Meng Xu wrote: > [...] > > > > [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor > > [PATCH RFC v1 2/4] xl for rt scheduler > > [PATCH RFC v1 3/4] libxl for rt scheduler > > [PATCH RFC v1 4/4] libxc for rt scheduler > > > > I have some general comments on how you arrange these patches. > > At a glance of the title and code you should do them in the order of 1, > 4, 3 and 2. Apparently xl depends on libxl, libxl depends on libxc, and > libxc depends on hypervisor. You will break bisection with current > ordering. > Yep, I agree with Wei. > And we normally write titles like > xen: add rt scheduler > libxl: introduce rt scheduler > xl: XXXX > etc. > start with component name and separate with colon. > Indeed we do, and this helps quite a bit. > Last but not least, you need to CC relevant maintainers. You can find > out maintainers with scripts/get_maintainers.pl. > Yes, but this one, Meng almost got it right, I think. Basically, Meng, you're missing hypervisors maintainers (at least for patch 1). :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Introduce rt real-time scheduler for Xen 2014-07-11 11:06 ` Dario Faggioli @ 2014-07-11 16:14 ` Meng Xu 0 siblings, 0 replies; 47+ messages in thread From: Meng Xu @ 2014-07-11 16:14 UTC (permalink / raw) To: Dario Faggioli Cc: Wei Liu, Ian Campbell, Sisu Xi, Stefano Stabellini, George Dunlap, Ian Jackson, xen-devel@lists.xen.org, Meng Xu, Chong Li, Dagaen Golomb [-- Attachment #1.1: Type: text/plain, Size: 1524 bytes --] Hi Wei and Dario, 2014-07-11 7:06 GMT-04:00 Dario Faggioli <dario.faggioli@citrix.com>: > On ven, 2014-07-11 at 11:50 +0100, Wei Liu wrote: > > On Fri, Jul 11, 2014 at 12:49:54AM -0400, Meng Xu wrote: > > [...] > > > > > > [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor > > > [PATCH RFC v1 2/4] xl for rt scheduler > > > [PATCH RFC v1 3/4] libxl for rt scheduler > > > [PATCH RFC v1 4/4] libxc for rt scheduler > > > > > > > I have some general comments on how you arrange these patches. > > > > At a glance of the title and code you should do them in the order of 1, > > 4, 3 and 2. Apparently xl depends on libxl, libxl depends on libxc, and > > libxc depends on hypervisor. You will break bisection with current > > ordering. > > > Yep, I agree with Wei. > > > And we normally write titles like > > xen: add rt scheduler > > libxl: introduce rt scheduler > > xl: XXXX > > etc. > > start with component name and separate with colon. > > > Indeed we do, and this helps quite a bit. > > > Last but not least, you need to CC relevant maintainers. You can find > > out maintainers with scripts/get_maintainers.pl. > > > Yes, but this one, Meng almost got it right, I think. > > Basically, Meng, you're missing hypervisors maintainers (at least for > patch 1). :-) > > Thank you very much for your advice! I will modify them in the next version of the scheduler: 1) rearrange the patch order, 2) change the commit log; 3) add all maintainers. Meng [-- Attachment #1.2: Type: text/html, Size: 2486 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Introduce rt real-time scheduler for Xen 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu ` (4 preceding siblings ...) 2014-07-11 10:50 ` Introduce rt real-time scheduler for Xen Wei Liu @ 2014-07-11 16:19 ` Dario Faggioli 5 siblings, 0 replies; 47+ messages in thread From: Dario Faggioli @ 2014-07-11 16:19 UTC (permalink / raw) To: Meng Xu Cc: ian.campbell, xisisu, stefano.stabellini, george.dunlap, ian.jackson, xen-devel, xumengpanda, lichong659, dgolomb [-- Attachment #1.1: Type: text/plain, Size: 6899 bytes --] On ven, 2014-07-11 at 00:49 -0400, Meng Xu wrote: > This serie of patches adds rt real-time scheduler to Xen. > He Meng, Sisu! Nice to see you here on xen-devel with this nice code-drop! :-P > In summary, It supports: > 1) Preemptive Global Earliest Deadline First scheduling policy by using a global RunQ for the scheduler; > 2) Assign/display each VCPU's parameters of each domain; > 3) Supports CPU Pool > Great, thanks for doing the effort of extracting this from your code base, and submit it here. :-) Having look at the series carefully, I think it's a nice piece of work already. There's quite a few modification and cleanups to do, and I think there's room for quite a bit of improvement, but I really like the fact that all the features are basically there already. In particular, proper SMP support, per-VCPU scheduling parameters, and a sane and theoretically sound budgetting scheme is what we're missing in SEDF[*], and we need these things badly! [*] Josh's RFC is improving this, but only wrt to the latter one (sane scheduling algorithm). > ----------------------------------------------------------------------------------------------------------------------------- > One scenario to show the functionality of this rt scheduler is as follows: > //list each vcpu's parameters of each domain in cpu pools using rt scheduler > #xl sched-rt > Cpupool Pool-0: sched=EDF > Name ID VCPU Period Budget > Domain-0 0 0 10 10 > Domain-0 0 1 20 20 > Domain-0 0 2 30 30 > Domain-0 0 3 10 10 > litmus1 1 0 10 4 > litmus1 1 1 10 4 > > [...] > Thanks for showing this also. > ----------------------------------------------------------------------------------------------------------------------------- > The differences between this new rt real-time scheduler and the sedf scheduler are as follows: > 1) rt scheduler supports global EDF scheduling, while sedf only supports partitioned scheduling. With the support of vcpu mask, rt scheduler can also be used as partitioned scheduling by setting each VCPU’s cpumask to a specific cpu. > Which is be biggest and most important difference. In fact, although the implementation of this scheduler can be improved (AFAICT) wrt this aspect too, adding SMP support to SEDF would be much much harder... > 2) rt scheduler supports setting and getting each VCPU’s parameters of a domain. A domain can have multiple vcpus with different parameters, rt scheduler can let user get/set the parameters of each VCPU of a specific domain; (sedf scheduler does not support it now) > 3) rt scheduler supports cpupool. > Right. Well, to be fair, SEDF supports cpupools as well. :-) > 4) rt scheduler uses deferrable server to burn/replenish budget of a VCPU, while sedf uses constrant bandwidth server to burn/replenish budget of a VCPU. This is just two options of implementing a global EDF real-time scheduler and both options’ real-time performance have already been proved in academic. > So, can you put some links to some of your works on top of RT-Xen, which is from which this scheduler comes from? Or, if that's not possible, at least the titles? I really don't expect people to jump on research papers, but the I've seen a few, and the experimental sections were nice to read and quite useful. > ----------------------------------------------------------------------------------------------------------------------------- > TODO: > Allow me to add a few items here, in some sort of priority order (at least mine one): *) Deal with budget overrun in the algorithm [medium] *) Split runnable and depleted (=no budget left) VCPU queues [easy] > 1) Improve the code of getting/setting each VCPU’s parameters. [easy] > Right now, it create an array with LIBXL_XEN_LEGACY_MAX_VCPUS (i.e., 32) elements to bounce all VCPUs’ parameters of a domain between xen tool and xen to get all VCPUs’ parameters of a domain. It is unnecessary to have LIBXL_XEN_LEGACY_MAX_VCPUS elements for this array. > The current work is to first get the exact number of VCPUs of a domain and then create an array with that exact number of elements to bounce between xen tool and xen. > 2) Provide microsecond time precision in xl interface instead of millisecond time precision. [easy] > Right now, rt scheduler let user to specify each VCPU’s parameters (period, budget) in millisecond (i.e., ms). In some real-time application, user may want to specify VCPUs’ parameters in microsecond (i.e., us). The next work is to let user specify VCPUs’ parameters in microsecond and count the time in microsecond (or nanosecond) in xen rt scheduler as well. > *) Subject Dom0 to the EDF+DS scheduling, as all other domains [easy] We can discuss what default Dom0 parameters should be, but we certainly want it to be scheduled as all other domains, and not getting too much of a special treatment. > 3) Add Xen trace into the rt scheduler. [easy] > We will add a few xentrace tracepoints, like TRC_CSCHED2_RUNQ_POS in credit2 scheduler, in rt scheduler, to debug via tracing. > *) Try using timers for replenishment, instead of scanning the full runqueue every now and then [medium] > 4) Method of improving the performance of rt scheduler [future work] > VCPUs of the same domain may preempt each other based on the preemptive global EDF scheduling policy. This self-switch issue does not bring benefit to the domain but introduce more overhead. When this situation happens, we can simply promote the current running lower-priority VCPU’s priority and let it borrow budget from higher priority VCPUs to avoid such self-swtich issue. > > Timeline of implementing the TODOs: > We plan to finish the TODO 1), 2) and 3) within 3-4 weeks (or earlier). > Because TODO 4) will make the scheduling policy not pure GEDF, (people who wants the real GEDF may not be happy with this.) we look forward to hearing people’s opinions. > That one is definitely something we can concentrate on later. > ----------------------------------------------------------------------------------------------------------------------------- > Special huge thanks to Dario Faggioli for his helpful and detailed comments on the preview version of this rt scheduler. :-) > :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2014-07-18 18:40 UTC | newest] Thread overview: 47+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-11 4:49 Introduce rt real-time scheduler for Xen Meng Xu 2014-07-11 4:49 ` [PATCH RFC v1 1/4] rt: Add rt scheduler to hypervisor Meng Xu 2014-07-11 14:27 ` Dario Faggioli 2014-07-11 14:37 ` Andrew Cooper 2014-07-11 15:21 ` Dario Faggioli 2014-07-11 15:40 ` Andrew Cooper 2014-07-11 15:48 ` Dario Faggioli 2014-07-16 17:05 ` Konrad Rzeszutek Wilk 2014-07-17 10:12 ` Meng Xu 2014-07-17 15:12 ` Dario Faggioli 2014-07-18 5:46 ` Meng Xu 2014-07-18 18:40 ` Konrad Rzeszutek Wilk 2014-07-11 4:49 ` [PATCH RFC v1 2/4] xl for rt scheduler Meng Xu 2014-07-11 11:02 ` Wei Liu 2014-07-11 14:59 ` Meng Xu 2014-07-11 15:07 ` Dario Faggioli 2014-07-11 16:25 ` Meng Xu 2014-07-13 12:58 ` Meng Xu 2014-07-14 7:40 ` Dario Faggioli 2014-07-14 9:31 ` Wei Liu 2014-07-17 15:39 ` Ian Campbell 2014-07-11 4:49 ` [PATCH RFC v1 3/4] libxl " Meng Xu 2014-07-11 11:05 ` Wei Liu 2014-07-11 15:08 ` Dario Faggioli 2014-07-12 18:16 ` Meng Xu 2014-07-14 10:38 ` Dario Faggioli 2014-07-17 15:34 ` Ian Campbell 2014-07-17 15:36 ` Ian Campbell 2014-07-18 11:05 ` Meng Xu 2014-07-11 4:49 ` [PATCH RFC v1 4/4] libxc " Meng Xu 2014-07-11 14:49 ` Dario Faggioli 2014-07-11 16:23 ` Meng Xu 2014-07-11 16:35 ` Dario Faggioli 2014-07-11 16:49 ` Andrew Cooper 2014-07-12 19:46 ` Meng Xu 2014-07-17 15:29 ` Ian Campbell 2014-07-17 15:34 ` George Dunlap 2014-07-17 22:16 ` Meng Xu 2014-07-18 9:49 ` Dario Faggioli 2014-07-18 9:51 ` Ian Campbell 2014-07-18 12:11 ` Meng Xu 2014-07-18 9:47 ` Ian Campbell 2014-07-18 10:00 ` Dario Faggioli 2014-07-11 10:50 ` Introduce rt real-time scheduler for Xen Wei Liu 2014-07-11 11:06 ` Dario Faggioli 2014-07-11 16:14 ` Meng Xu 2014-07-11 16:19 ` Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).