From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
Date: Thu, 12 Apr 2012 11:24:46 +0100 [thread overview]
Message-ID: <4F86AD6E.3050705@eu.citrix.com> (raw)
In-Reply-To: <7e76233448b02810f0ae.1334150272@Solace>
On 11/04/12 14:17, Dario Faggioli wrote:
> Let the user specify the NUMA node affinity of a guest via the
> 'nodes=' config switch. Explicitly listing the intended target host
> nodes is required as of now.
>
> A valid setting will directly impact the node_affinity mask
> of the guest, i.e., it will change the behaviour of the low level
> memory allocator. On the other hand, this commit does not affect
> by any means how the guest's vCPUs are scheduled on the host's
> pCPUs.
I would probably break this into three separate patches, starting with
the hypervisor, then libxc, then libxl, especially since they tend to
have different maintainers and committers.
>
> TODO: * Better investigate and test interactions with cpupools.
>
> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>
>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -112,6 +112,15 @@ List of which cpus the guest is allowed
> (all vcpus will run on cpus 0,2,3,5), or `cpus=["2", "3"]` (all vcpus
> will run on cpus 2 and 3).
>
> +=item B<nodes=[ NODE, NODE, ...]>
> +
> +List of the NUMA nodes of the host the guest should be considered
> +`affine` with. Being affine to a (set of) NUMA node(s) mainly means
> +the guest's memory is going to be allocated on those node(s).
> +
> +A list of nodes should be specified as follows: `nodes=["0", "3"]`
> +for the guest to be affine with nodes 0 and 3.
> +
Hmm, I think that using "affine" here is technically correct, and is
what one would use if writing a research paper; but it's unusual to hear
the word in more common English; it would be more common to hear someone
describe a VM as "having affinity with". How about something like this:
"The list of NUMA nodes the domain is considered to have affinity with.
Memory from the guest will be allocated from these nodes."
(Technically you're also not supposed to end a sentence with a
preposition, but I think "...with which the domain is considered to have
affinity" is just to awkward.)
Also, is there a way to specify that the affinity to be to all nodes
and/or based on the vcpu mask of the vcpus?
> =item B<memory=MBYTES>
>
> Start the guest with MBYTES megabytes of RAM.
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -110,6 +110,44 @@ int xc_domain_shutdown(xc_interface *xch
> }
>
>
> +int xc_domain_node_affinity(xc_interface *xch,
> + uint32_t domid,
> + xc_nodemap_t node_affinity)
> +{
> + DECLARE_DOMCTL;
> + DECLARE_HYPERCALL_BUFFER(uint8_t, local);
> + int ret = -1;
> + int nodesize;
> +
> + nodesize = xc_get_nodemap_size(xch);
> + if (!nodesize)
> + {
> + PERROR("Could not get number of nodes");
> + goto out;
> + }
> +
> + local = xc_hypercall_buffer_alloc(xch, local, nodesize);
> + if ( local == NULL )
> + {
> + PERROR("Could not allocate memory for domain_node_affinity domctl hypercall");
> + goto out;
> + }
> +
> + domctl.cmd = XEN_DOMCTL_node_affinity;
> + domctl.domain = (domid_t)domid;
> +
> + memcpy(local, node_affinity, nodesize);
> + set_xen_guest_handle(domctl.u.node_affinity.nodemap.bitmap, local);
> + domctl.u.node_affinity.nodemap.nr_elems = nodesize * 8;
> +
> + ret = do_domctl(xch,&domctl);
> +
> + xc_hypercall_buffer_free(xch, local);
> +
> + out:
> + return ret;
> +}
> +
> int xc_vcpu_setaffinity(xc_interface *xch,
> uint32_t domid,
> int vcpu,
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -520,6 +520,19 @@ int xc_watchdog(xc_interface *xch,
> uint32_t id,
> uint32_t timeout);
>
> +/**
> + * This function explicitly sets the affinity of a domain with the
> + * host NUMA nodes.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id in which vcpus are to be created.
> + * @parm node_affinity the map of the affine nodes.
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_domain_node_affinity(xc_interface *xch,
> + uint32_t domind,
> + xc_nodemap_t node_affinity);
> +
Seems like it would be useful to have both a get and a set.
> int xc_vcpu_setaffinity(xc_interface *xch,
> uint32_t domid,
> int vcpu,
> diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py
> --- a/tools/libxl/gentest.py
> +++ b/tools/libxl/gentest.py
> @@ -20,7 +20,7 @@ def randomize_case(s):
> def randomize_enum(e):
> return random.choice([v.name for v in e.values])
>
> -handcoded = ["libxl_cpumap", "libxl_key_value_list",
> +handcoded = ["libxl_cpumap", "libxl_nodemap", "libxl_key_value_list",
> "libxl_cpuid_policy_list", "libxl_file_reference",
> "libxl_string_list"]
>
> @@ -119,6 +119,19 @@ static void libxl_cpumap_rand_init(libxl
> }
> }
>
> +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap)
> +{
> + int i;
> + nodemap->size = rand() % 16;
> + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map));
> + libxl_for_each_node(i, *nodemap) {
> + if (rand() % 2)
> + libxl_nodemap_set(nodemap, i);
> + else
> + libxl_nodemap_reset(nodemap, i);
> + }
> +}
> +
> static void libxl_key_value_list_rand_init(libxl_key_value_list *pkvl)
> {
> int i, nr_kvp = rand() % 16;
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -3082,6 +3082,16 @@ int libxl_set_vcpuaffinity_all(libxl_ctx
> return rc;
> }
>
> +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
> + libxl_nodemap *nodemap)
> +{
> + if (xc_domain_node_affinity(ctx->xch, domid, nodemap->map)) {
> + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting node affinity");
> + return ERROR_FAIL;
> + }
> + return 0;
> +}
> +
> int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap)
> {
> GC_INIT(ctx);
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -727,6 +727,8 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct
> libxl_cpumap *cpumap);
> int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
> unsigned int max_vcpus, libxl_cpumap *cpumap);
> +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
> + libxl_nodemap *nodemap);
Hmm -- is there really no libxl_get_vcpuaffinity()? That seems to be a
missing component...
> int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap);
>
> libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx);
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -121,6 +121,12 @@ int libxl__domain_build_info_setdefault(
> libxl_cpumap_set_any(&b_info->cpumap);
> }
>
> + if (!b_info->nodemap.size) {
> + if (libxl_nodemap_alloc(CTX,&b_info->nodemap))
> + return ERROR_NOMEM;
> + libxl_nodemap_set_none(&b_info->nodemap);
> + }
> +
> if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT)
> b_info->max_memkb = 32 * 1024;
> if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -67,6 +67,7 @@ int libxl__build_pre(libxl__gc *gc, uint
> char *xs_domid, *con_domid;
> xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus);
> libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,&info->cpumap);
> + libxl_set_node_affinity(ctx, domid,&info->nodemap);
> xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT);
> if (info->type == LIBXL_DOMAIN_TYPE_PV)
> xc_domain_set_memmap_limit(ctx->xch, domid,
> diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c
> --- a/tools/libxl/libxl_json.c
> +++ b/tools/libxl/libxl_json.c
> @@ -119,6 +119,26 @@ out:
> return s;
> }
>
> +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand,
> + libxl_nodemap *nodemap)
> +{
> + yajl_gen_status s;
> + int i;
> +
> + s = yajl_gen_array_open(hand);
> + if (s != yajl_gen_status_ok) goto out;
> +
> + libxl_for_each_node(i, *nodemap) {
> + if (libxl_nodemap_test(nodemap, i)) {
> + s = yajl_gen_integer(hand, i);
> + if (s != yajl_gen_status_ok) goto out;
> + }
> + }
> + s = yajl_gen_array_close(hand);
> +out:
> + return s;
> +}
> +
> yajl_gen_status libxl_key_value_list_gen_json(yajl_gen hand,
> libxl_key_value_list *pkvl)
> {
> diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h
> --- a/tools/libxl/libxl_json.h
> +++ b/tools/libxl/libxl_json.h
> @@ -27,6 +27,7 @@ yajl_gen_status libxl_domid_gen_json(yaj
> yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p);
> yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p);
> yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p);
> +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, libxl_nodemap *p);
> yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand,
> libxl_cpuid_policy_list *p);
> yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -11,6 +11,7 @@ libxl_domid = Builtin("domid", json_fn =
> libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE)
> libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE)
> libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE)
> +libxl_nodemap = Builtin("nodemap", dispose_fn="libxl_nodemap_dispose", passby=PASS_BY_REFERENCE)
> libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE)
>
> libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE)
> @@ -233,6 +234,7 @@ libxl_domain_build_info = Struct("domain
> ("max_vcpus", integer),
> ("cur_vcpus", integer),
> ("cpumap", libxl_cpumap),
> + ("nodemap", libxl_nodemap),
> ("tsc_mode", libxl_tsc_mode),
> ("max_memkb", MemKB),
> ("target_memkb", MemKB),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -515,7 +515,7 @@ static void parse_config_data(const char
> const char *buf;
> long l;
> XLU_Config *config;
> - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids;
> + XLU_ConfigList *cpus, *nodes, *vbds, *nics, *pcis, *cvfbs, *cpuids;
> int pci_power_mgmt = 0;
> int pci_msitranslate = 1;
> int pci_permissive = 0;
> @@ -628,6 +628,39 @@ static void parse_config_data(const char
> free(buf2);
> }
>
> + if (!xlu_cfg_get_list (config, "nodes",&nodes, 0, 0)) {
> + int i, n_nodes = 0;
> +
> + if (libxl_nodemap_alloc(ctx,&b_info->nodemap)) {
> + fprintf(stderr, "Unable to allocate nodemap\n");
> + exit(1);
> + }
> +
> + libxl_nodemap_set_none(&b_info->nodemap);
> + while ((buf = xlu_cfg_get_listitem(nodes, n_nodes)) != NULL) {
> + i = atoi(buf);
> + if (!libxl_nodemap_node_valid(&b_info->nodemap, i)) {
> + fprintf(stderr, "node %d illegal\n", i);
> + exit(1);
> + }
> + libxl_nodemap_set(&b_info->nodemap, i);
> + n_nodes++;
> + }
> +
> + if (n_nodes == 0)
> + fprintf(stderr, "No valid node found: no affinity will be set\n");
> + }
> + else {
> + /*
> + * XXX What would a sane default be? I think doing nothing
> + * (i.e., relying on cpu-affinity/cpupool ==> the current
> + * behavior) should be just fine.
> + * That would mean we're saying to the user "if you want us
> + * to take care of NUMA, please tell us, maybe just putting
> + * 'nodes=auto', but tell us... otherwise we do as usual".
> + */
I think that for this patch, doing nothing is fine (which means removing
the whole else clause). But once we have the auto placement, I think
that "nodes=auto" should be the default.
> + }
> +
> if (!xlu_cfg_get_long (config, "memory",&l, 0)) {
> b_info->max_memkb = l * 1024;
> b_info->target_memkb = b_info->max_memkb;
> diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
> --- a/tools/python/xen/lowlevel/xl/xl.c
> +++ b/tools/python/xen/lowlevel/xl/xl.c
> @@ -243,6 +243,18 @@ int attrib__libxl_cpumap_set(PyObject *v
> return 0;
> }
>
> +int attrib__libxl_nodemap_set(PyObject *v, libxl_nodemap *pptr)
> +{
> + int i;
> + long node;
> +
> + for (i = 0; i< PyList_Size(v); i++) {
> + node = PyInt_AsLong(PyList_GetItem(v, i));
> + libxl_nodemap_set(pptr, node);
> + }
> + return 0;
> +}
> +
> int attrib__libxl_file_reference_set(PyObject *v, libxl_file_reference *pptr)
> {
> return genwrap__string_set(v,&pptr->path);
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -222,6 +222,7 @@ struct domain *domain_create(
>
> spin_lock_init(&d->node_affinity_lock);
> d->node_affinity = NODE_MASK_ALL;
> + d->has_node_affinity = 0;
>
> spin_lock_init(&d->shutdown_lock);
> d->shutdown_code = -1;
> @@ -337,7 +338,7 @@ void domain_update_node_affinity(struct
> {
> cpumask_var_t cpumask;
> cpumask_var_t online_affinity;
> - const cpumask_t *online;
> + const cpumask_t *online = cpupool_online_cpumask(d->cpupool);
> nodemask_t nodemask = NODE_MASK_NONE;
> struct vcpu *v;
> unsigned int node;
> @@ -350,9 +351,22 @@ void domain_update_node_affinity(struct
> return;
> }
>
> - online = cpupool_online_cpumask(d->cpupool);
> + spin_lock(&d->node_affinity_lock);
>
> - spin_lock(&d->node_affinity_lock);
> + /*
> + * If someone explicitly told us our NUMA affinity, avoid messing
> + * that up. Notice, however, that vcpu affinity is still what
> + * drives all scheduling decisions.
> + *
> + * XXX I'm quite sure this is fine wrt to the various v->cpu_affinity
> + * (at least, that was the intended design!). Could it be useful
> + * to cross-check d->node_affinity against `online`? The basic
> + * idea here is "Hey, if you shoot yourself in the foot... You've
> + * shot in the foot!", but, you know...
> + */
> + if ( d->has_node_affinity )
> + goto out;
> +
>
> for_each_vcpu ( d, v )
> {
> @@ -365,6 +379,8 @@ void domain_update_node_affinity(struct
> node_set(node, nodemask);
>
> d->node_affinity = nodemask;
> +
> +out:
> spin_unlock(&d->node_affinity_lock);
>
> free_cpumask_var(online_affinity);
> @@ -372,6 +388,31 @@ void domain_update_node_affinity(struct
> }
>
>
> +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity)
> +{
> + spin_lock(&d->node_affinity_lock);
> +
> + /*
> + * Being/becoming affine to _no_ nodes is not going to work,
> + * let's take it as the `reset node affinity` command.
> + */
> + if ( nodes_empty(*affinity) )
> + {
> + d->has_node_affinity = 0;
> + spin_unlock(&d->node_affinity_lock);
> + domain_update_node_affinity(d);
> + return 0;
> + }
> +
> + d->has_node_affinity = 1;
> + d->node_affinity = *affinity;
> +
> + spin_unlock(&d->node_affinity_lock);
> +
> + return 0;
> +}
> +
> +
> struct domain *get_domain_by_id(domid_t dom)
> {
> struct domain *d;
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -621,6 +621,27 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
> }
> break;
>
> + case XEN_DOMCTL_node_affinity:
> + {
> + domid_t dom = op->domain;
> + struct domain *d = rcu_lock_domain_by_id(dom);
> + nodemask_t new_affinity;
> +
> + ret = -ESRCH;
> + if ( d == NULL )
> + break;
> +
> + /* XXX We need an xsm_* for this I guess, right? */
Yes. :-)
> +
> + ret = xenctl_nodemap_to_nodemask(&new_affinity,
> +&op->u.node_affinity.nodemap);
> + if ( !ret )
> + ret = domain_set_node_affinity(d,&new_affinity);
> +
> + rcu_unlock_domain(d);
> + }
> + break;
> +
> case XEN_DOMCTL_setvcpuaffinity:
> case XEN_DOMCTL_getvcpuaffinity:
> {
> diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -217,6 +217,14 @@ static void cpuset_print(char *set, int
> *set++ = '\0';
> }
>
> +static void nodeset_print(char *set, int size, const nodemask_t *mask)
> +{
> + *set++ = '[';
> + set += nodelist_scnprintf(set, size-2, mask);
> + *set++ = ']';
> + *set++ = '\0';
> +}
> +
> static void periodic_timer_print(char *str, int size, uint64_t period)
> {
> if ( period == 0 )
> @@ -272,6 +280,9 @@ static void dump_domains(unsigned char k
>
> dump_pageframe_info(d);
>
> + nodeset_print(tmpstr, sizeof(tmpstr),&d->node_affinity);
> + printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr);
> +
> printk("VCPU information and callbacks for domain %u:\n",
> d->domain_id);
> for_each_vcpu ( d, v )
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -278,6 +278,15 @@ typedef struct xen_domctl_getvcpuinfo xe
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t);
>
>
> +/* Set the NUMA node(s) with which the guest is `affine`. */
> +/* XEN_DOMCTL_node_affinity */
> +struct xen_domctl_node_affinity {
> + struct xenctl_map nodemap; /* IN */
> +};
> +typedef struct xen_domctl_node_affinity xen_domctl_node_affinity_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_node_affinity_t);
> +
> +
> /* Get/set which physical cpus a vcpu can execute on. */
> /* XEN_DOMCTL_setvcpuaffinity */
> /* XEN_DOMCTL_getvcpuaffinity */
> @@ -914,6 +923,7 @@ struct xen_domctl {
> #define XEN_DOMCTL_set_access_required 64
> #define XEN_DOMCTL_audit_p2m 65
> #define XEN_DOMCTL_set_virq_handler 66
> +#define XEN_DOMCTL_node_affinity 67
> #define XEN_DOMCTL_gdbsx_guestmemio 1000
> #define XEN_DOMCTL_gdbsx_pausevcpu 1001
> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002
> @@ -927,6 +937,7 @@ struct xen_domctl {
> struct xen_domctl_getpageframeinfo getpageframeinfo;
> struct xen_domctl_getpageframeinfo2 getpageframeinfo2;
> struct xen_domctl_getpageframeinfo3 getpageframeinfo3;
> + struct xen_domctl_node_affinity node_affinity;
> struct xen_domctl_vcpuaffinity vcpuaffinity;
> struct xen_domctl_shadow_op shadow_op;
> struct xen_domctl_max_mem max_mem;
> diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
> --- a/xen/include/xen/nodemask.h
> +++ b/xen/include/xen/nodemask.h
> @@ -8,8 +8,9 @@
> * See detailed comments in the file linux/bitmap.h describing the
> * data type on which these nodemasks are based.
> *
> - * For details of nodemask_scnprintf() and nodemask_parse(),
> - * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c.
> + * For details of nodemask_scnprintf(), nodelist_scnpintf() and
> + * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse()
> + * in lib/bitmap.c.
> *
> * The available nodemask operations are:
> *
> @@ -48,6 +49,7 @@
> * unsigned long *nodes_addr(mask) Array of unsigned long's in mask
> *
> * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing
> + * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for printing
> * int nodemask_parse(ubuf, ulen, mask) Parse ascii string as nodemask
> *
> * for_each_node_mask(node, mask) for-loop node over mask
> @@ -280,6 +282,14 @@ static inline int __first_unset_node(con
>
> #define nodes_addr(src) ((src).bits)
>
> +#define nodelist_scnprintf(buf, len, src) \
> + __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES)
> +static inline int __nodelist_scnprintf(char *buf, int len,
> + const nodemask_t *srcp, int nbits)
> +{
> + return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
> +}
> +
> #if 0
> #define nodemask_scnprintf(buf, len, src) \
> __nodemask_scnprintf((buf), (len),&(src), MAX_NUMNODES)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -346,8 +346,12 @@ struct domain
> /* Various mem_events */
> struct mem_event_per_domain *mem_event;
>
> - /* Currently computed from union of all vcpu cpu-affinity masks. */
> + /*
> + * Can be specified by the user. If that is not the case, it is
> + * computed from the union of all the vcpu cpu-affinity masks.
> + */
> nodemask_t node_affinity;
> + int has_node_affinity;
There's something that seems a bit clunky about this; but I can't really
think of a better way. At every least I'd change the name of the
element here to something more descriptive; perhaps "auto_node_affinity"
(which would invert the meaning)?
> unsigned int last_alloc_node;
> spinlock_t node_affinity_lock;
> };
> @@ -416,6 +420,7 @@ static inline void get_knownalive_domain
> ASSERT(!(atomic_read(&d->refcnt)& DOMAIN_DESTROYED));
> }
>
> +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity);
> void domain_update_node_affinity(struct domain *d);
>
> struct domain *domain_create(
next prev parent reply other threads:[~2012-04-12 10:24 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-11 13:17 [PATCH 00 of 10 [RFC]] Automatically place guest on host's NUMA nodes with xl Dario Faggioli
2012-04-11 13:17 ` [PATCH 01 of 10 [RFC]] libxc: Generalize xenctl_cpumap to just xenctl_map Dario Faggioli
2012-04-11 16:08 ` George Dunlap
2012-04-11 16:31 ` Dario Faggioli
2012-04-11 16:41 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 02 of 10 [RFC]] libxl: Generalize libxl_cpumap to just libxl_map Dario Faggioli
2012-04-11 13:17 ` [PATCH 03 of 10 [RFC]] libxc, libxl: Introduce xc_nodemap_t and libxl_nodemap Dario Faggioli
2012-04-11 16:38 ` George Dunlap
2012-04-11 16:57 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 04 of 10 [RFC]] libxl: Introduce libxl_get_numainfo() calling xc_numainfo() Dario Faggioli
2012-04-11 13:17 ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-12 10:24 ` George Dunlap [this message]
2012-04-12 10:48 ` David Vrabel
2012-04-12 22:25 ` Dario Faggioli
2012-04-12 11:32 ` Formatting of emails which are comments on patches Ian Jackson
2012-04-12 11:42 ` George Dunlap
2012-04-12 22:21 ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-11 13:17 ` [PATCH 06 of 10 [RFC]] xl: Allow user to set or change node affinity on-line Dario Faggioli
2012-04-12 10:29 ` George Dunlap
2012-04-12 21:57 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 07 of 10 [RFC]] sched_credit: Let the scheduler know about `node affinity` Dario Faggioli
2012-04-12 23:06 ` Dario Faggioli
2012-04-27 14:45 ` George Dunlap
2012-05-02 15:13 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 08 of 10 [RFC]] xl: Introduce First Fit memory-wise placement of guests on nodes Dario Faggioli
2012-05-01 15:45 ` George Dunlap
2012-05-02 16:30 ` Dario Faggioli
2012-05-03 1:03 ` Dario Faggioli
2012-05-03 8:10 ` Ian Campbell
2012-05-03 10:16 ` George Dunlap
2012-05-03 13:41 ` George Dunlap
2012-05-03 14:58 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 09 of 10 [RFC]] xl: Introduce Best and Worst Fit guest placement algorithms Dario Faggioli
2012-04-16 10:29 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation Dario Faggioli
2012-04-12 9:11 ` Ian Campbell
2012-04-12 10:32 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F86AD6E.3050705@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=raistlin@linux.it \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).