From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Date: Thu, 12 Apr 2012 11:24:46 +0100 Message-ID: <4F86AD6E.3050705@eu.citrix.com> References: <7e76233448b02810f0ae.1334150272@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <7e76233448b02810f0ae.1334150272@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Andre Przywara , Ian Campbell , Stefano Stabellini , Juergen Gross , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 11/04/12 14:17, Dario Faggioli wrote: > Let the user specify the NUMA node affinity of a guest via the > 'nodes=' config switch. Explicitly listing the intended target host > nodes is required as of now. > > A valid setting will directly impact the node_affinity mask > of the guest, i.e., it will change the behaviour of the low level > memory allocator. On the other hand, this commit does not affect > by any means how the guest's vCPUs are scheduled on the host's > pCPUs. I would probably break this into three separate patches, starting with the hypervisor, then libxc, then libxl, especially since they tend to have different maintainers and committers. > > TODO: * Better investigate and test interactions with cpupools. > > Signed-off-by: Dario Faggioli > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 > --- a/docs/man/xl.cfg.pod.5 > +++ b/docs/man/xl.cfg.pod.5 > @@ -112,6 +112,15 @@ List of which cpus the guest is allowed > (all vcpus will run on cpus 0,2,3,5), or `cpus=["2", "3"]` (all vcpus > will run on cpus 2 and 3). > > +=item B > + > +List of the NUMA nodes of the host the guest should be considered > +`affine` with. Being affine to a (set of) NUMA node(s) mainly means > +the guest's memory is going to be allocated on those node(s). > + > +A list of nodes should be specified as follows: `nodes=["0", "3"]` > +for the guest to be affine with nodes 0 and 3. > + Hmm, I think that using "affine" here is technically correct, and is what one would use if writing a research paper; but it's unusual to hear the word in more common English; it would be more common to hear someone describe a VM as "having affinity with". How about something like this: "The list of NUMA nodes the domain is considered to have affinity with. Memory from the guest will be allocated from these nodes." (Technically you're also not supposed to end a sentence with a preposition, but I think "...with which the domain is considered to have affinity" is just to awkward.) Also, is there a way to specify that the affinity to be to all nodes and/or based on the vcpu mask of the vcpus? > =item B > > Start the guest with MBYTES megabytes of RAM. > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c > --- a/tools/libxc/xc_domain.c > +++ b/tools/libxc/xc_domain.c > @@ -110,6 +110,44 @@ int xc_domain_shutdown(xc_interface *xch > } > > > +int xc_domain_node_affinity(xc_interface *xch, > + uint32_t domid, > + xc_nodemap_t node_affinity) > +{ > + DECLARE_DOMCTL; > + DECLARE_HYPERCALL_BUFFER(uint8_t, local); > + int ret = -1; > + int nodesize; > + > + nodesize = xc_get_nodemap_size(xch); > + if (!nodesize) > + { > + PERROR("Could not get number of nodes"); > + goto out; > + } > + > + local = xc_hypercall_buffer_alloc(xch, local, nodesize); > + if ( local == NULL ) > + { > + PERROR("Could not allocate memory for domain_node_affinity domctl hypercall"); > + goto out; > + } > + > + domctl.cmd = XEN_DOMCTL_node_affinity; > + domctl.domain = (domid_t)domid; > + > + memcpy(local, node_affinity, nodesize); > + set_xen_guest_handle(domctl.u.node_affinity.nodemap.bitmap, local); > + domctl.u.node_affinity.nodemap.nr_elems = nodesize * 8; > + > + ret = do_domctl(xch,&domctl); > + > + xc_hypercall_buffer_free(xch, local); > + > + out: > + return ret; > +} > + > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h > --- a/tools/libxc/xenctrl.h > +++ b/tools/libxc/xenctrl.h > @@ -520,6 +520,19 @@ int xc_watchdog(xc_interface *xch, > uint32_t id, > uint32_t timeout); > > +/** > + * This function explicitly sets the affinity of a domain with the > + * host NUMA nodes. > + * > + * @parm xch a handle to an open hypervisor interface. > + * @parm domid the domain id in which vcpus are to be created. > + * @parm node_affinity the map of the affine nodes. > + * @return 0 on success, -1 on failure. > + */ > +int xc_domain_node_affinity(xc_interface *xch, > + uint32_t domind, > + xc_nodemap_t node_affinity); > + Seems like it would be useful to have both a get and a set. > int xc_vcpu_setaffinity(xc_interface *xch, > uint32_t domid, > int vcpu, > diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py > --- a/tools/libxl/gentest.py > +++ b/tools/libxl/gentest.py > @@ -20,7 +20,7 @@ def randomize_case(s): > def randomize_enum(e): > return random.choice([v.name for v in e.values]) > > -handcoded = ["libxl_cpumap", "libxl_key_value_list", > +handcoded = ["libxl_cpumap", "libxl_nodemap", "libxl_key_value_list", > "libxl_cpuid_policy_list", "libxl_file_reference", > "libxl_string_list"] > > @@ -119,6 +119,19 @@ static void libxl_cpumap_rand_init(libxl > } > } > > +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap) > +{ > + int i; > + nodemap->size = rand() % 16; > + nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map)); > + libxl_for_each_node(i, *nodemap) { > + if (rand() % 2) > + libxl_nodemap_set(nodemap, i); > + else > + libxl_nodemap_reset(nodemap, i); > + } > +} > + > static void libxl_key_value_list_rand_init(libxl_key_value_list *pkvl) > { > int i, nr_kvp = rand() % 16; > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -3082,6 +3082,16 @@ int libxl_set_vcpuaffinity_all(libxl_ctx > return rc; > } > > +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid, > + libxl_nodemap *nodemap) > +{ > + if (xc_domain_node_affinity(ctx->xch, domid, nodemap->map)) { > + LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting node affinity"); > + return ERROR_FAIL; > + } > + return 0; > +} > + > int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap) > { > GC_INIT(ctx); > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -727,6 +727,8 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct > libxl_cpumap *cpumap); > int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid, > unsigned int max_vcpus, libxl_cpumap *cpumap); > +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid, > + libxl_nodemap *nodemap); Hmm -- is there really no libxl_get_vcpuaffinity()? That seems to be a missing component... > int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap); > > libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx); > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -121,6 +121,12 @@ int libxl__domain_build_info_setdefault( > libxl_cpumap_set_any(&b_info->cpumap); > } > > + if (!b_info->nodemap.size) { > + if (libxl_nodemap_alloc(CTX,&b_info->nodemap)) > + return ERROR_NOMEM; > + libxl_nodemap_set_none(&b_info->nodemap); > + } > + > if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT) > b_info->max_memkb = 32 * 1024; > if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT) > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > --- a/tools/libxl/libxl_dom.c > +++ b/tools/libxl/libxl_dom.c > @@ -67,6 +67,7 @@ int libxl__build_pre(libxl__gc *gc, uint > char *xs_domid, *con_domid; > xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus); > libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,&info->cpumap); > + libxl_set_node_affinity(ctx, domid,&info->nodemap); > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT); > if (info->type == LIBXL_DOMAIN_TYPE_PV) > xc_domain_set_memmap_limit(ctx->xch, domid, > diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c > --- a/tools/libxl/libxl_json.c > +++ b/tools/libxl/libxl_json.c > @@ -119,6 +119,26 @@ out: > return s; > } > > +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, > + libxl_nodemap *nodemap) > +{ > + yajl_gen_status s; > + int i; > + > + s = yajl_gen_array_open(hand); > + if (s != yajl_gen_status_ok) goto out; > + > + libxl_for_each_node(i, *nodemap) { > + if (libxl_nodemap_test(nodemap, i)) { > + s = yajl_gen_integer(hand, i); > + if (s != yajl_gen_status_ok) goto out; > + } > + } > + s = yajl_gen_array_close(hand); > +out: > + return s; > +} > + > yajl_gen_status libxl_key_value_list_gen_json(yajl_gen hand, > libxl_key_value_list *pkvl) > { > diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h > --- a/tools/libxl/libxl_json.h > +++ b/tools/libxl/libxl_json.h > @@ -27,6 +27,7 @@ yajl_gen_status libxl_domid_gen_json(yaj > yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p); > yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p); > yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p); > +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, libxl_nodemap *p); > yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand, > libxl_cpuid_policy_list *p); > yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p); > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -11,6 +11,7 @@ libxl_domid = Builtin("domid", json_fn = > libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE) > libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE) > libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE) > +libxl_nodemap = Builtin("nodemap", dispose_fn="libxl_nodemap_dispose", passby=PASS_BY_REFERENCE) > libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE) > > libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE) > @@ -233,6 +234,7 @@ libxl_domain_build_info = Struct("domain > ("max_vcpus", integer), > ("cur_vcpus", integer), > ("cpumap", libxl_cpumap), > + ("nodemap", libxl_nodemap), > ("tsc_mode", libxl_tsc_mode), > ("max_memkb", MemKB), > ("target_memkb", MemKB), > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -515,7 +515,7 @@ static void parse_config_data(const char > const char *buf; > long l; > XLU_Config *config; > - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids; > + XLU_ConfigList *cpus, *nodes, *vbds, *nics, *pcis, *cvfbs, *cpuids; > int pci_power_mgmt = 0; > int pci_msitranslate = 1; > int pci_permissive = 0; > @@ -628,6 +628,39 @@ static void parse_config_data(const char > free(buf2); > } > > + if (!xlu_cfg_get_list (config, "nodes",&nodes, 0, 0)) { > + int i, n_nodes = 0; > + > + if (libxl_nodemap_alloc(ctx,&b_info->nodemap)) { > + fprintf(stderr, "Unable to allocate nodemap\n"); > + exit(1); > + } > + > + libxl_nodemap_set_none(&b_info->nodemap); > + while ((buf = xlu_cfg_get_listitem(nodes, n_nodes)) != NULL) { > + i = atoi(buf); > + if (!libxl_nodemap_node_valid(&b_info->nodemap, i)) { > + fprintf(stderr, "node %d illegal\n", i); > + exit(1); > + } > + libxl_nodemap_set(&b_info->nodemap, i); > + n_nodes++; > + } > + > + if (n_nodes == 0) > + fprintf(stderr, "No valid node found: no affinity will be set\n"); > + } > + else { > + /* > + * XXX What would a sane default be? I think doing nothing > + * (i.e., relying on cpu-affinity/cpupool ==> the current > + * behavior) should be just fine. > + * That would mean we're saying to the user "if you want us > + * to take care of NUMA, please tell us, maybe just putting > + * 'nodes=auto', but tell us... otherwise we do as usual". > + */ I think that for this patch, doing nothing is fine (which means removing the whole else clause). But once we have the auto placement, I think that "nodes=auto" should be the default. > + } > + > if (!xlu_cfg_get_long (config, "memory",&l, 0)) { > b_info->max_memkb = l * 1024; > b_info->target_memkb = b_info->max_memkb; > diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c > --- a/tools/python/xen/lowlevel/xl/xl.c > +++ b/tools/python/xen/lowlevel/xl/xl.c > @@ -243,6 +243,18 @@ int attrib__libxl_cpumap_set(PyObject *v > return 0; > } > > +int attrib__libxl_nodemap_set(PyObject *v, libxl_nodemap *pptr) > +{ > + int i; > + long node; > + > + for (i = 0; i< PyList_Size(v); i++) { > + node = PyInt_AsLong(PyList_GetItem(v, i)); > + libxl_nodemap_set(pptr, node); > + } > + return 0; > +} > + > int attrib__libxl_file_reference_set(PyObject *v, libxl_file_reference *pptr) > { > return genwrap__string_set(v,&pptr->path); > diff --git a/xen/common/domain.c b/xen/common/domain.c > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -222,6 +222,7 @@ struct domain *domain_create( > > spin_lock_init(&d->node_affinity_lock); > d->node_affinity = NODE_MASK_ALL; > + d->has_node_affinity = 0; > > spin_lock_init(&d->shutdown_lock); > d->shutdown_code = -1; > @@ -337,7 +338,7 @@ void domain_update_node_affinity(struct > { > cpumask_var_t cpumask; > cpumask_var_t online_affinity; > - const cpumask_t *online; > + const cpumask_t *online = cpupool_online_cpumask(d->cpupool); > nodemask_t nodemask = NODE_MASK_NONE; > struct vcpu *v; > unsigned int node; > @@ -350,9 +351,22 @@ void domain_update_node_affinity(struct > return; > } > > - online = cpupool_online_cpumask(d->cpupool); > + spin_lock(&d->node_affinity_lock); > > - spin_lock(&d->node_affinity_lock); > + /* > + * If someone explicitly told us our NUMA affinity, avoid messing > + * that up. Notice, however, that vcpu affinity is still what > + * drives all scheduling decisions. > + * > + * XXX I'm quite sure this is fine wrt to the various v->cpu_affinity > + * (at least, that was the intended design!). Could it be useful > + * to cross-check d->node_affinity against `online`? The basic > + * idea here is "Hey, if you shoot yourself in the foot... You've > + * shot in the foot!", but, you know... > + */ > + if ( d->has_node_affinity ) > + goto out; > + > > for_each_vcpu ( d, v ) > { > @@ -365,6 +379,8 @@ void domain_update_node_affinity(struct > node_set(node, nodemask); > > d->node_affinity = nodemask; > + > +out: > spin_unlock(&d->node_affinity_lock); > > free_cpumask_var(online_affinity); > @@ -372,6 +388,31 @@ void domain_update_node_affinity(struct > } > > > +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity) > +{ > + spin_lock(&d->node_affinity_lock); > + > + /* > + * Being/becoming affine to _no_ nodes is not going to work, > + * let's take it as the `reset node affinity` command. > + */ > + if ( nodes_empty(*affinity) ) > + { > + d->has_node_affinity = 0; > + spin_unlock(&d->node_affinity_lock); > + domain_update_node_affinity(d); > + return 0; > + } > + > + d->has_node_affinity = 1; > + d->node_affinity = *affinity; > + > + spin_unlock(&d->node_affinity_lock); > + > + return 0; > +} > + > + > struct domain *get_domain_by_id(domid_t dom) > { > struct domain *d; > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -621,6 +621,27 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc > } > break; > > + case XEN_DOMCTL_node_affinity: > + { > + domid_t dom = op->domain; > + struct domain *d = rcu_lock_domain_by_id(dom); > + nodemask_t new_affinity; > + > + ret = -ESRCH; > + if ( d == NULL ) > + break; > + > + /* XXX We need an xsm_* for this I guess, right? */ Yes. :-) > + > + ret = xenctl_nodemap_to_nodemask(&new_affinity, > +&op->u.node_affinity.nodemap); > + if ( !ret ) > + ret = domain_set_node_affinity(d,&new_affinity); > + > + rcu_unlock_domain(d); > + } > + break; > + > case XEN_DOMCTL_setvcpuaffinity: > case XEN_DOMCTL_getvcpuaffinity: > { > diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c > --- a/xen/common/keyhandler.c > +++ b/xen/common/keyhandler.c > @@ -217,6 +217,14 @@ static void cpuset_print(char *set, int > *set++ = '\0'; > } > > +static void nodeset_print(char *set, int size, const nodemask_t *mask) > +{ > + *set++ = '['; > + set += nodelist_scnprintf(set, size-2, mask); > + *set++ = ']'; > + *set++ = '\0'; > +} > + > static void periodic_timer_print(char *str, int size, uint64_t period) > { > if ( period == 0 ) > @@ -272,6 +280,9 @@ static void dump_domains(unsigned char k > > dump_pageframe_info(d); > > + nodeset_print(tmpstr, sizeof(tmpstr),&d->node_affinity); > + printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr); > + > printk("VCPU information and callbacks for domain %u:\n", > d->domain_id); > for_each_vcpu ( d, v ) > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -278,6 +278,15 @@ typedef struct xen_domctl_getvcpuinfo xe > DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t); > > > +/* Set the NUMA node(s) with which the guest is `affine`. */ > +/* XEN_DOMCTL_node_affinity */ > +struct xen_domctl_node_affinity { > + struct xenctl_map nodemap; /* IN */ > +}; > +typedef struct xen_domctl_node_affinity xen_domctl_node_affinity_t; > +DEFINE_XEN_GUEST_HANDLE(xen_domctl_node_affinity_t); > + > + > /* Get/set which physical cpus a vcpu can execute on. */ > /* XEN_DOMCTL_setvcpuaffinity */ > /* XEN_DOMCTL_getvcpuaffinity */ > @@ -914,6 +923,7 @@ struct xen_domctl { > #define XEN_DOMCTL_set_access_required 64 > #define XEN_DOMCTL_audit_p2m 65 > #define XEN_DOMCTL_set_virq_handler 66 > +#define XEN_DOMCTL_node_affinity 67 > #define XEN_DOMCTL_gdbsx_guestmemio 1000 > #define XEN_DOMCTL_gdbsx_pausevcpu 1001 > #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 > @@ -927,6 +937,7 @@ struct xen_domctl { > struct xen_domctl_getpageframeinfo getpageframeinfo; > struct xen_domctl_getpageframeinfo2 getpageframeinfo2; > struct xen_domctl_getpageframeinfo3 getpageframeinfo3; > + struct xen_domctl_node_affinity node_affinity; > struct xen_domctl_vcpuaffinity vcpuaffinity; > struct xen_domctl_shadow_op shadow_op; > struct xen_domctl_max_mem max_mem; > diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h > --- a/xen/include/xen/nodemask.h > +++ b/xen/include/xen/nodemask.h > @@ -8,8 +8,9 @@ > * See detailed comments in the file linux/bitmap.h describing the > * data type on which these nodemasks are based. > * > - * For details of nodemask_scnprintf() and nodemask_parse(), > - * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c. > + * For details of nodemask_scnprintf(), nodelist_scnpintf() and > + * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse() > + * in lib/bitmap.c. > * > * The available nodemask operations are: > * > @@ -48,6 +49,7 @@ > * unsigned long *nodes_addr(mask) Array of unsigned long's in mask > * > * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing > + * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for printing > * int nodemask_parse(ubuf, ulen, mask) Parse ascii string as nodemask > * > * for_each_node_mask(node, mask) for-loop node over mask > @@ -280,6 +282,14 @@ static inline int __first_unset_node(con > > #define nodes_addr(src) ((src).bits) > > +#define nodelist_scnprintf(buf, len, src) \ > + __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES) > +static inline int __nodelist_scnprintf(char *buf, int len, > + const nodemask_t *srcp, int nbits) > +{ > + return bitmap_scnlistprintf(buf, len, srcp->bits, nbits); > +} > + > #if 0 > #define nodemask_scnprintf(buf, len, src) \ > __nodemask_scnprintf((buf), (len),&(src), MAX_NUMNODES) > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -346,8 +346,12 @@ struct domain > /* Various mem_events */ > struct mem_event_per_domain *mem_event; > > - /* Currently computed from union of all vcpu cpu-affinity masks. */ > + /* > + * Can be specified by the user. If that is not the case, it is > + * computed from the union of all the vcpu cpu-affinity masks. > + */ > nodemask_t node_affinity; > + int has_node_affinity; There's something that seems a bit clunky about this; but I can't really think of a better way. At every least I'd change the name of the element here to something more descriptive; perhaps "auto_node_affinity" (which would invert the meaning)? > unsigned int last_alloc_node; > spinlock_t node_affinity_lock; > }; > @@ -416,6 +420,7 @@ static inline void get_knownalive_domain > ASSERT(!(atomic_read(&d->refcnt)& DOMAIN_DESTROYED)); > } > > +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity); > void domain_update_node_affinity(struct domain *d); > > struct domain *domain_create(