From: Keir Fraser <keir@xen.org>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] xen: Implement domain runstates
Date: Tue, 23 Nov 2010 15:00:15 +0000 [thread overview]
Message-ID: <C911877F.ADA4%keir@xen.org> (raw)
In-Reply-To: <4CEBCE60.2030403@eu.citrix.com>
On 23/11/2010 14:23, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> There is a difference: how the various runstates of the vcpus were
> correlated.
>
> So for example, suppose you have 2 vcpus with the following runstates:
> v0: running 50%, runnable 25%, blocked 25%
> v1: running 50%, runnable 25%, blocked 25%
>
> That could break down like this:
> full run: 25%
> concurrency hazard: 50%
> blocked: 25%
>
> Or like this:
> full run: 5%
> concurrency hazard: 50%
> partial run: 40%
> blocked: 5%
>
> Or like this:
> full contention: 20%
> full run: 50%
> concurrency hazard: 5%
> blocked: 25%
>
> To me those tell very different stories about what the VM's workload is
> like and what it's getting cpu-wise from Xen.
The concurrency hazard %age might be useful for telling you how much Xen
scheduling is screwing the VM I suppose. Still I wonder how much of this
could typically be deduced from per-vcpu stats. And how much this improves
load balancer performance beyond very basic per-domain stats. Overall you
can count me as not 100% convinced. But mny own personal opinion on the
pros/cons of this particular toolstack interface is only part of my
argument...
>> Also, I would never take a patch for a new control interface
>> without the thing that uses it being supplied at the same time; Otherwise
>> you can just as well maintain it out of tree.
>
> Would "the thing that uses it" be a libxc call, or an actual program
> using the data (like xentop)?
>
> FWIW, xapi uses this call in XenServer (just for customer reporting).
> We could carry it out-of-tree, but we're trying to minimize the
> patchqueue. But if modifying xentop to be able to report these was a
> requisite for acceptance, I'd make an effort. :-)
Well, actually what we care about is #users-who-care. Plumbing into xentop
to satisfy an acceptance criteria is probably misguided. Reducing
patch-queue length just to make a patch-queue short is also probably
misguided.
I think this patch can live in the XCP patchqueue quite happily. If the
feature proves itself and others want it as a more general facility, for
other toolstacks or whatever, it can be considered for upstream in due
course. There's no rush.
-- Keir
> -George
>
>>
>> -- Keir
>>
>> On 23/11/2010 11:26, "George Dunlap"<george.dunlap@eu.citrix.com> wrote:
>>
>>> To help simplify the load configuration of servers, I developed the
>>> idea of a "domain runstate". The "runstate" of a domain is defined
>>> by the runstate of its constituent vcpus. The "states" are defined
>>> as follows:
>>>
>>> Blocked: All vcpus blocked
>>> Partial run: Some vcpus running, some blocked
>>> Partial contention: Some vcpus waiting for the cpu, some blocked
>>> Full run: All vcpus running
>>> Concurrency hazard: Some vcpus running, some waiting for the cpu
>>> Full contention: All vcpus waiting for the cpu
>>>
>>> The idea is that by looking at the amount of time in each state
>>> over the last unit of time, an administrator (or an automated
>>> load-balancing program) can determine whether their vcpu and/or
>>> domain configuration needs to be tweaked. For example:
>>>
>>> If a VM spends the majority of its time in "full run", it may
>>> not have enough vcpus to do its work.
>>>
>>> If a VM spends a large amount of time in full contention, the system
>>> is probably too busy, and some workloads should be moved onto other
>>> servers.
>>>
>>> If a VM spends a large amount of time in concurrency hazard, it means
>>> that the VM may be wasting a lot of time in spinlocks and other
>>> synchronization (like cache misses). Either work should be moved off
>>> the machine, or the number of vcpus assigned to the VM should be
>>> reduced.
>>>
>>> The state is protected by a lock, but to avoid tricky deadlock
>>> issues, if the lock cannot be acquired, the state is simply not
>>> updated. Number of lost state updates is recorded.
>>>
>>> Signed-off-by: George Dunlap<george.dunlap@eu.citrix.com>
>>>
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domain.c
>>> --- a/xen/common/domain.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/domain.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -238,6 +238,7 @@
>>> spin_lock_init_prof(d, domain_lock);
>>> spin_lock_init_prof(d, page_alloc_lock);
>>> spin_lock_init(&d->hypercall_deadlock_mutex);
>>> + spin_lock_init(&d->runstate_lock);
>>> INIT_PAGE_LIST_HEAD(&d->page_list);
>>> INIT_PAGE_LIST_HEAD(&d->xenpage_list);
>>>
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/domctl.c
>>> --- a/xen/common/domctl.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/domctl.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -970,6 +970,24 @@
>>> }
>>> break;
>>>
>>> + case XEN_DOMCTL_get_runstate_info:
>>> + {
>>> + struct domain *d;
>>> +
>>> + ret = -ESRCH;
>>> + d = rcu_lock_domain_by_id(op->domain);
>>> + if ( d != NULL )
>>> + {
>>> + domain_runstate_get(d,&op->u.domain_runstate);
>>> + ret = 0;
>>> +
>>> + rcu_unlock_domain(d);
>>> +
>>> + if ( copy_to_guest(u_domctl, op, 1) )
>>> + ret = -EFAULT;
>>> + }
>>> + break;
>>> + }
>>> default:
>>> ret = arch_do_domctl(op, u_domctl);
>>> break;
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/common/schedule.c
>>> --- a/xen/common/schedule.c Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/common/schedule.c Tue Nov 23 11:25:50 2010 +0000
>>> @@ -135,10 +135,31 @@
>>> }
>>> }
>>>
>>> +/* Used to quickly map the vcpu runstate mask to a domain runstate */
>>> +static int mask_to_state[] = {
>>> + /* 000: Nothing in any runstate. Should never happen. */
>>> + -1,
>>> + /* 001: All running */
>>> + DOMAIN_RUNSTATE_full_run,
>>> + /* 010: All runnable */
>>> + DOMAIN_RUNSTATE_full_contention,
>>> + /* 011: Some running, some runnable */
>>> + DOMAIN_RUNSTATE_concurrency_hazard,
>>> + /* 100: All blocked / offline */
>>> + DOMAIN_RUNSTATE_blocked,
>>> + /* 101: Some running, some blocked / offline */
>>> + DOMAIN_RUNSTATE_partial_run,
>>> + /* 110: Some blocked / offline, some runnable */
>>> + DOMAIN_RUNSTATE_partial_contention,
>>> + /* 111: Some in every state. Mixed running + runnable is most
>>> important.
>>> */
>>> + DOMAIN_RUNSTATE_concurrency_hazard
>>> +};
>>> +
>>> static inline void vcpu_runstate_change(
>>> struct vcpu *v, int new_state, s_time_t new_entry_time)
>>> {
>>> s_time_t delta;
>>> + struct domain *d = v->domain;
>>>
>>> ASSERT(v->runstate.state != new_state);
>>>
>>> ASSERT(spin_is_locked(per_cpu(schedule_data,v->processor).schedule_lock));
>>> @@ -155,6 +176,37 @@
>>> }
>>>
>>> v->runstate.state = new_state;
>>> +
>>> + /* Update domain runstate */
>>> + if(spin_trylock(&d->runstate_lock)) {
>>> + unsigned mask=0;
>>> + struct vcpu *ov;
>>> +
>>> + BUG_ON(d->runstate.state> DOMAIN_RUNSTATE_partial_contention);
>>> +
>>> + d->runstate.time[d->runstate.state] +=
>>> + (new_entry_time - d->runstate.state_entry_time);
>>> + d->runstate.state_entry_time = new_entry_time;
>>> +
>>> + /* Determine new runstate. First, see what states we have */
>>> + for_each_vcpu(d, ov) {
>>> + /* Don't count vcpus that have beent taken offline by the guest
>>> */
>>> + if(! (ov->runstate.state == RUNSTATE_offline
>>> +&& test_bit(_VPF_down,&ov->pause_flags)) )
>>> + mask |= (1<< ov->runstate.state);
>>> + }
>>> +
>>> + BUG_ON(mask == 0);
>>> +
>>> + /* Offline& blocked are the same */
>>> + mask |= ((1<< RUNSTATE_offline)& mask)>> 1;
>>> +
>>> + d->runstate.state = mask_to_state[mask&0x7];
>>> +
>>> + spin_unlock(&d->runstate_lock);
>>> + } else {
>>> + atomic_inc(&d->runstate_missed_changes);
>>> + }
>>> }
>>>
>>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
>>> *runstate)
>>> @@ -173,6 +225,18 @@
>>> vcpu_schedule_unlock_irq(v);
>>> }
>>>
>>> +void domain_runstate_get(struct domain *d,
>>> + domain_runstate_info_t *runstate)
>>> +{
>>> + spin_lock(&d->runstate_lock);
>>> +
>>> + memcpy(runstate,&d->runstate, sizeof(*runstate));
>>> + runstate->time[d->runstate.state] += NOW() -
>>> runstate->state_entry_time;
>>> + runstate->missed_changes = atomic_read(&d->runstate_missed_changes);
>>> +
>>> + spin_unlock(&d->runstate_lock);
>>> +}
>>> +
>>> uint64_t get_cpu_idle_time(unsigned int cpu)
>>> {
>>> struct vcpu_runstate_info state;
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/public/domctl.h
>>> --- a/xen/include/public/domctl.h Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/include/public/domctl.h Tue Nov 23 11:25:50 2010 +0000
>>> @@ -806,6 +806,46 @@
>>> DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpuextstate_t);
>>> #endif
>>>
>>> +/*
>>> + * Return information about the state and running time of a domain.
>>> + * The "domain runstate" is based on the runstates of all the vcpus of the
>>> + * domain (see below).
>>> + * @extra_arg == pointer to domain_runstate_info structure.
>>> + */
>>> +struct xen_domctl_runstate_info {
>>> + /* Domain's's current state (DOMAIN_RUNSTATE_*). */
>>> + uint32_t state;
>>> + /* Number of times we missed an update due to contention */
>>> + uint32_t missed_changes;
>>> + /* When was current state entered (system time, ns)? */
>>> + uint64_t state_entry_time;
>>> + /*
>>> + * Time spent in each RUNSTATE_* (ns). The sum of these times is
>>> + * NOT guaranteed not to drift from system time.
>>> + */
>>> + uint64_t time[6];
>>> +};
>>> +typedef struct xen_domctl_runstate_info xen_domctl_runstate_info_t;
>>> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_runstate_info_t);
>>> +
>>> +/* All vcpus are running */
>>> +#define DOMAIN_RUNSTATE_full_run 0
>>> +
>>> +/* All vcpus are runnable (i.e., waiting for cpu) */
>>> +#define DOMAIN_RUNSTATE_full_contention 1
>>> +
>>> +/* Some vcpus are running, some are runnable */
>>> +#define DOMAIN_RUNSTATE_concurrency_hazard 2
>>> +
>>> +/* All vcpus are blocked / offline */
>>> +#define DOMAIN_RUNSTATE_blocked 3
>>> +
>>> +/* Some vpcus are running, some are blocked */
>>> +#define DOMAIN_RUNSTATE_partial_run 4
>>> +
>>> +/* Some vcpus are runnable, some are blocked */
>>> +#define DOMAIN_RUNSTATE_partial_contention 5
>>> +
>>> struct xen_domctl {
>>> uint32_t cmd;
>>> #define XEN_DOMCTL_createdomain 1
>>> @@ -868,6 +908,7 @@
>>> #define XEN_DOMCTL_getpageframeinfo3 61
>>> #define XEN_DOMCTL_setvcpuextstate 62
>>> #define XEN_DOMCTL_getvcpuextstate 63
>>> +#define XEN_DOMCTL_get_runstate_info 64
>>> #define XEN_DOMCTL_gdbsx_guestmemio 1000
>>> #define XEN_DOMCTL_gdbsx_pausevcpu 1001
>>> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002
>>> @@ -923,6 +964,7 @@
>>> struct xen_domctl_gdbsx_memio gdbsx_guest_memio;
>>> struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
>>> struct xen_domctl_gdbsx_domstatus gdbsx_domstatus;
>>> + struct xen_domctl_runstate_info domain_runstate;
>>> uint8_t pad[128];
>>> } u;
>>> };
>>> diff -r c0c1f5f0745e -r 692200e7ce1e xen/include/xen/sched.h
>>> --- a/xen/include/xen/sched.h Mon Nov 22 19:16:34 2010 +0000
>>> +++ b/xen/include/xen/sched.h Tue Nov 23 11:25:50 2010 +0000
>>> @@ -200,6 +200,8 @@
>>> int xen_port;
>>> };
>>>
>>> +typedef struct xen_domctl_runstate_info domain_runstate_info_t;
>>> +
>>> struct domain
>>> {
>>> domid_t domain_id;
>>> @@ -332,6 +334,11 @@
>>> nodemask_t node_affinity;
>>> unsigned int last_alloc_node;
>>> spinlock_t node_affinity_lock;
>>> +
>>> + /* Domain runstate tracking */
>>> + spinlock_t runstate_lock;
>>> + atomic_t runstate_missed_changes;
>>> + domain_runstate_info_t runstate;
>>> };
>>>
>>> struct domain_setup_info
>>> @@ -614,6 +621,7 @@
>>> int vcpu_set_affinity(struct vcpu *v, cpumask_t *affinity);
>>>
>>> void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info
>>> *runstate);
>>> +void domain_runstate_get(struct domain *d, domain_runstate_info_t
>>> *runstate);
>>> uint64_t get_cpu_idle_time(unsigned int cpu);
>>>
>>> /*
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>
>>
>
next prev parent reply other threads:[~2010-11-23 15:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-23 11:26 [PATCH] xen: Implement domain runstates George Dunlap
2010-11-23 12:51 ` Keir Fraser
2010-11-23 14:23 ` George Dunlap
2010-11-23 15:00 ` Keir Fraser [this message]
2010-11-23 15:48 ` George Dunlap
2010-11-23 18:41 ` Pasi Kärkkäinen
2010-11-25 19:11 ` Ian Jackson
2010-11-25 22:10 ` Keir Fraser
2010-11-25 20:02 ` Bruce Edge
2010-11-25 22:24 ` Keir Fraser
2010-11-25 22:32 ` Bruce Edge
2010-11-26 11:52 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C911877F.ADA4%keir@xen.org \
--to=keir@xen.org \
--cc=George.Dunlap@eu.citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.