* [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
@ 2014-03-05 14:36 ` Dario Faggioli
2014-03-05 14:50 ` Juergen Gross
2014-03-05 15:04 ` Jan Beulich
2014-03-05 14:36 ` [PATCH 2/4] libxc: " Dario Faggioli
` (4 subsequent siblings)
5 siblings, 2 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 14:36 UTC (permalink / raw)
To: xen-devel
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
Jan Beulich, Daniel De Graaf
by means of a new hypercal, XEN_DOMCTL_numainfo, doing something
similar to what XEN_SYSCTL_numainfo does, but on a per domain basis.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
xen/common/domctl.c | 45 +++++++++++++++++++++++++++++++++++
xen/include/public/domctl.h | 22 +++++++++++++++++
xen/xsm/flask/hooks.c | 3 ++
xen/xsm/flask/policy/access_vectors | 2 ++
4 files changed, 72 insertions(+)
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 7cf610a..96bf326 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
}
break;
+ case XEN_DOMCTL_numainfo:
+ {
+ uint32_t node, max_node_index, last_online_node;
+ xen_domctl_numainfo_t *ni = &op->u.numainfo;
+ uint64_t *memkb_on_node;
+ struct page_info *page;
+
+ /*
+ * We report back info about the min number of nodes between how
+ * much of them the caller can handle and the number of them that
+ * are actually online.
+ */
+ last_online_node = last_node(node_online_map);
+ max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
+ ni->max_node_index = max_node_index;
+
+ ret = -ENOMEM;
+ memkb_on_node = xzalloc_array(uint64_t, max_node_index);
+ if ( !memkb_on_node )
+ break;
+
+ spin_lock(&d->page_alloc_lock);
+ page_list_for_each(page, &d->page_list)
+ {
+ node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
+ /* For nodes that are offline, don't touch the counter */
+ if ( node <= max_node_index && node_online(node) )
+ memkb_on_node[node]++;
+ }
+ spin_unlock(&d->page_alloc_lock);
+
+ for ( node = 0; node <= max_node_index; node++ )
+ {
+ memkb_on_node[node] <<= (PAGE_SHIFT-10);
+ if ( copy_to_guest_offset(ni->memkb_on_node, node,
+ &memkb_on_node[node], 1) )
+ break;
+ }
+
+ ret = ((node <= max_node_index) || copy_to_guest(u_domctl, op, 1))
+ ? -EFAULT : 0;
+ xfree(memkb_on_node);
+ }
+ break;
+
case XEN_DOMCTL_destroydomain:
{
ret = domain_kill(d);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index f22fe2e..a455d78 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
+/* XEN_DOMCTL_numainfo */
+struct xen_domctl_numainfo {
+ /*
+ * IN: maximum addressable entry in the caller-provided arrays.
+ * OUT: minimum between the maximum addressable entry in the
+ * caller-provided arrays and largest online node identifier
+ * in the system.
+ */
+ uint32_t max_node_index;
+
+ /*
+ * OUT: memory, in Kb, on each node. i-eth element equal to 0 means
+ * either "no memory on node i" or "node i offline".
+ */
+ XEN_GUEST_HANDLE_64(uint64) memkb_on_node;
+};
+typedef struct xen_domctl_numainfo xen_domctl_numainfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_numainfo_t);
+
+
/* XEN_DOMCTL_scheduler_op */
/* Scheduler types. */
#define XEN_SCHEDULER_SEDF 4
@@ -966,6 +986,7 @@ struct xen_domctl {
#define XEN_DOMCTL_getnodeaffinity 69
#define XEN_DOMCTL_set_max_evtchn 70
#define XEN_DOMCTL_cacheflush 71
+#define XEN_DOMCTL_numainfo 72
#define XEN_DOMCTL_gdbsx_guestmemio 1000
#define XEN_DOMCTL_gdbsx_pausevcpu 1001
#define XEN_DOMCTL_gdbsx_unpausevcpu 1002
@@ -986,6 +1007,7 @@ struct xen_domctl {
struct xen_domctl_vcpucontext vcpucontext;
struct xen_domctl_getvcpuinfo getvcpuinfo;
struct xen_domctl_max_vcpus max_vcpus;
+ struct xen_domctl_numainfo numainfo;
struct xen_domctl_scheduler_op scheduler_op;
struct xen_domctl_setdomainhandle setdomainhandle;
struct xen_domctl_setdebugging setdebugging;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 96276ac..edc1d34 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd)
case XEN_DOMCTL_cacheflush:
return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CACHEFLUSH);
+ case XEN_DOMCTL_numainfo:
+ return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__NUMAINFO);
+
default:
printk("flask_domctl: Unknown op %d\n", cmd);
return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index a0ed13d..e218992 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -198,6 +198,8 @@ class domain2
set_max_evtchn
# XEN_DOMCTL_cacheflush
cacheflush
+# XEN_DOMCTL_numainfo
+ numainfo
}
# Similar to class domain, but primarily contains domctls related to HVM domains
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 1/4] xen: " Dario Faggioli
@ 2014-03-05 14:50 ` Juergen Gross
2014-03-05 16:31 ` Dario Faggioli
2014-03-05 15:04 ` Jan Beulich
1 sibling, 1 reply; 33+ messages in thread
From: Juergen Gross @ 2014-03-05 14:50 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
On 05.03.2014 15:36, Dario Faggioli wrote:
> by means of a new hypercal, XEN_DOMCTL_numainfo, doing something
> similar to what XEN_SYSCTL_numainfo does, but on a per domain basis.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> xen/common/domctl.c | 45 +++++++++++++++++++++++++++++++++++
> xen/include/public/domctl.h | 22 +++++++++++++++++
> xen/xsm/flask/hooks.c | 3 ++
> xen/xsm/flask/policy/access_vectors | 2 ++
> 4 files changed, 72 insertions(+)
>
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 7cf610a..96bf326 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> }
> break;
>
> + case XEN_DOMCTL_numainfo:
> + {
> + uint32_t node, max_node_index, last_online_node;
> + xen_domctl_numainfo_t *ni = &op->u.numainfo;
> + uint64_t *memkb_on_node;
> + struct page_info *page;
> +
> + /*
> + * We report back info about the min number of nodes between how
> + * much of them the caller can handle and the number of them that
> + * are actually online.
> + */
> + last_online_node = last_node(node_online_map);
> + max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
> + ni->max_node_index = max_node_index;
> +
> + ret = -ENOMEM;
> + memkb_on_node = xzalloc_array(uint64_t, max_node_index);
> + if ( !memkb_on_node )
> + break;
> +
> + spin_lock(&d->page_alloc_lock);
> + page_list_for_each(page, &d->page_list)
> + {
> + node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
> + /* For nodes that are offline, don't touch the counter */
> + if ( node <= max_node_index && node_online(node) )
> + memkb_on_node[node]++;
> + }
This loop will run quite a long time for huge domains. Wouldn't it be better
to do the accounting during page allocation?
> + spin_unlock(&d->page_alloc_lock);
> +
> + for ( node = 0; node <= max_node_index; node++ )
> + {
> + memkb_on_node[node] <<= (PAGE_SHIFT-10);
If you already use a 64 bit element you could use bytes as unit.
> + if ( copy_to_guest_offset(ni->memkb_on_node, node,
> + &memkb_on_node[node], 1) )
> + break;
> + }
> +
> + ret = ((node <= max_node_index) || copy_to_guest(u_domctl, op, 1))
> + ? -EFAULT : 0;
> + xfree(memkb_on_node);
> + }
> + break;
> +
> case XEN_DOMCTL_destroydomain:
> {
> ret = domain_kill(d);
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index f22fe2e..a455d78 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus xen_domctl_max_vcpus_t;
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>
>
> +/* XEN_DOMCTL_numainfo */
> +struct xen_domctl_numainfo {
> + /*
> + * IN: maximum addressable entry in the caller-provided arrays.
> + * OUT: minimum between the maximum addressable entry in the
> + * caller-provided arrays and largest online node identifier
> + * in the system.
> + */
> + uint32_t max_node_index;
Add explicit padding?
> +
> + /*
> + * OUT: memory, in Kb, on each node. i-eth element equal to 0 means
> + * either "no memory on node i" or "node i offline".
> + */
> + XEN_GUEST_HANDLE_64(uint64) memkb_on_node;
> +};
> +typedef struct xen_domctl_numainfo xen_domctl_numainfo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_numainfo_t);
> +
> +
> /* XEN_DOMCTL_scheduler_op */
> /* Scheduler types. */
> #define XEN_SCHEDULER_SEDF 4
> @@ -966,6 +986,7 @@ struct xen_domctl {
> #define XEN_DOMCTL_getnodeaffinity 69
> #define XEN_DOMCTL_set_max_evtchn 70
> #define XEN_DOMCTL_cacheflush 71
> +#define XEN_DOMCTL_numainfo 72
> #define XEN_DOMCTL_gdbsx_guestmemio 1000
> #define XEN_DOMCTL_gdbsx_pausevcpu 1001
> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002
> @@ -986,6 +1007,7 @@ struct xen_domctl {
> struct xen_domctl_vcpucontext vcpucontext;
> struct xen_domctl_getvcpuinfo getvcpuinfo;
> struct xen_domctl_max_vcpus max_vcpus;
> + struct xen_domctl_numainfo numainfo;
> struct xen_domctl_scheduler_op scheduler_op;
> struct xen_domctl_setdomainhandle setdomainhandle;
> struct xen_domctl_setdebugging setdebugging;
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 96276ac..edc1d34 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -727,6 +727,9 @@ static int flask_domctl(struct domain *d, int cmd)
> case XEN_DOMCTL_cacheflush:
> return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CACHEFLUSH);
>
> + case XEN_DOMCTL_numainfo:
> + return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__NUMAINFO);
> +
> default:
> printk("flask_domctl: Unknown op %d\n", cmd);
> return -EPERM;
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index a0ed13d..e218992 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -198,6 +198,8 @@ class domain2
> set_max_evtchn
> # XEN_DOMCTL_cacheflush
> cacheflush
> +# XEN_DOMCTL_numainfo
> + numainfo
> }
>
> # Similar to class domain, but primarily contains domctls related to HVM domains
Juergen
--
Juergen Gross Principal Developer Operating Systems
PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932
Fujitsu e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 14:50 ` Juergen Gross
@ 2014-03-05 16:31 ` Dario Faggioli
2014-03-05 16:49 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 16:31 UTC (permalink / raw)
To: Juergen Gross
Cc: Ian Campbell, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 2837 bytes --]
On mer, 2014-03-05 at 15:50 +0100, Juergen Gross wrote:
> On 05.03.2014 15:36, Dario Faggioli wrote:
> > diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> > index 7cf610a..96bf326 100644
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> > + spin_lock(&d->page_alloc_lock);
> > + page_list_for_each(page, &d->page_list)
> > + {
> > + node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
> > + /* For nodes that are offline, don't touch the counter */
> > + if ( node <= max_node_index && node_online(node) )
> > + memkb_on_node[node]++;
> > + }
>
> This loop will run quite a long time for huge domains. Wouldn't it be better
> to do the accounting during page allocation?
>
Se the reply to Jan (and feel free to chime in, of course. :-P)
> > + spin_unlock(&d->page_alloc_lock);
> > +
> > + for ( node = 0; node <= max_node_index; node++ )
> > + {
> > + memkb_on_node[node] <<= (PAGE_SHIFT-10);
>
> If you already use a 64 bit element you could use bytes as unit.
>
Yeah, I was trying to be consistent with other calls and interfaces, but
there is pretty much everything out there with respect to this: uint32,
unsigned int, unsigned long... :-O
At least in libxl we have (tools/libxl/libxl_types.idl):
#
# Specific integer types
#
MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT")
And hence:
typedef struct libxl_domain_build_info {
int max_vcpus;
libxl_bitmap avail_vcpus;
libxl_bitmap cpumap;
libxl_bitmap nodemap;
libxl_defbool numa_placement;
libxl_tsc_mode tsc_mode;
uint64_t max_memkb;
uint64_t target_memkb;
uint64_t video_memkb;
uint64_t shadow_memkb;
...
Having looked at this again, I wonder whether sticking to Kbs and
switching to uint32 wouldn't be the best solution...
> > +/* XEN_DOMCTL_numainfo */
> > +struct xen_domctl_numainfo {
> > + /*
> > + * IN: maximum addressable entry in the caller-provided arrays.
> > + * OUT: minimum between the maximum addressable entry in the
> > + * caller-provided arrays and largest online node identifier
> > + * in the system.
> > + */
> > + uint32_t max_node_index;
>
> Add explicit padding?
>
Let's see. As per the reply to Jan, this interface may change.
Anyway, I'll do so if it doesn't.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 16:31 ` Dario Faggioli
@ 2014-03-05 16:49 ` Jan Beulich
2014-03-05 17:14 ` Dario Faggioli
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2014-03-05 16:49 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, IanJackson, xen-devel,
DanielDe Graaf
>>> On 05.03.14 at 17:31, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> Having looked at this again, I wonder whether sticking to Kbs and
> switching to uint32 wouldn't be the best solution...
Staying with kb _and_ switching to uint32 would limit the interface
to 4Tb.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 16:49 ` Jan Beulich
@ 2014-03-05 17:14 ` Dario Faggioli
0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 17:14 UTC (permalink / raw)
To: Jan Beulich
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, IanJackson, xen-devel,
DanielDe Graaf
[-- Attachment #1.1: Type: text/plain, Size: 690 bytes --]
On mer, 2014-03-05 at 16:49 +0000, Jan Beulich wrote:
> >>> On 05.03.14 at 17:31, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> > Having looked at this again, I wonder whether sticking to Kbs and
> > switching to uint32 wouldn't be the best solution...
>
> Staying with kb _and_ switching to uint32 would limit the interface
> to 4Tb.
>
Right. u64 and bytes it is then, I guess.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 1/4] xen: " Dario Faggioli
2014-03-05 14:50 ` Juergen Gross
@ 2014-03-05 15:04 ` Jan Beulich
2014-03-05 16:13 ` Dario Faggioli
1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2014-03-05 15:04 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
xen-devel, Daniel De Graaf
>>> On 05.03.14 at 15:36, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> }
> break;
>
> + case XEN_DOMCTL_numainfo:
> + {
> + uint32_t node, max_node_index, last_online_node;
> + xen_domctl_numainfo_t *ni = &op->u.numainfo;
> + uint64_t *memkb_on_node;
> + struct page_info *page;
> +
> + /*
> + * We report back info about the min number of nodes between how
> + * much of them the caller can handle and the number of them that
> + * are actually online.
> + */
> + last_online_node = last_node(node_online_map);
> + max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
> + ni->max_node_index = max_node_index;
> +
> + ret = -ENOMEM;
> + memkb_on_node = xzalloc_array(uint64_t, max_node_index);
max_node_index + 1
> + if ( !memkb_on_node )
> + break;
> +
> + spin_lock(&d->page_alloc_lock);
> + page_list_for_each(page, &d->page_list)
No new non-preemptable potentially long running loops like this,
please. See XSA-77.
> + {
> + node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
Please don't open-code pfn_to_paddr()/page_to_maddr().
> + /* For nodes that are offline, don't touch the counter */
> + if ( node <= max_node_index && node_online(node) )
How can a node a running domain has memory on be offline?
Looks like you want an assertion here instead.
> + memkb_on_node[node]++;
> + }
> + spin_unlock(&d->page_alloc_lock);
> +
> + for ( node = 0; node <= max_node_index; node++ )
> + {
> + memkb_on_node[node] <<= (PAGE_SHIFT-10);
> + if ( copy_to_guest_offset(ni->memkb_on_node, node,
> + &memkb_on_node[node], 1) )
> + break;
I'd suggest to do the copying in one go after the loop.
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus
> xen_domctl_max_vcpus_t;
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>
>
> +/* XEN_DOMCTL_numainfo */
> +struct xen_domctl_numainfo {
> + /*
> + * IN: maximum addressable entry in the caller-provided arrays.
> + * OUT: minimum between the maximum addressable entry in the
> + * caller-provided arrays and largest online node identifier
> + * in the system.
> + */
> + uint32_t max_node_index;
With that IN/OUT specification (and the matching implementation
above), how would the caller know it passed too small a value to
fit all information?
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 15:04 ` Jan Beulich
@ 2014-03-05 16:13 ` Dario Faggioli
2014-03-05 16:44 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 16:13 UTC (permalink / raw)
To: Jan Beulich
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
xen-devel, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 5012 bytes --]
On mer, 2014-03-05 at 15:04 +0000, Jan Beulich wrote:
> >>> On 05.03.14 at 15:36, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> > }
> > break;
> >
> > + case XEN_DOMCTL_numainfo:
> > + {
> > + uint32_t node, max_node_index, last_online_node;
> > + xen_domctl_numainfo_t *ni = &op->u.numainfo;
> > + uint64_t *memkb_on_node;
> > + struct page_info *page;
> > +
> > + /*
> > + * We report back info about the min number of nodes between how
> > + * much of them the caller can handle and the number of them that
> > + * are actually online.
> > + */
> > + last_online_node = last_node(node_online_map);
> > + max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
> > + ni->max_node_index = max_node_index;
> > +
> > + ret = -ENOMEM;
> > + memkb_on_node = xzalloc_array(uint64_t, max_node_index);
>
> max_node_index + 1
>
Ouch. Will fix, thanks.
> > + if ( !memkb_on_node )
> > + break;
> > +
> > + spin_lock(&d->page_alloc_lock);
> > + page_list_for_each(page, &d->page_list)
>
> No new non-preemptable potentially long running loops like this,
> please. See XSA-77.
>
Not super familiar with XSAs, but do you perhaps 45 ("Several long
latency operations are not preemptible",
xenbits.xenproject.org/xsa/advisory-45.html)? 77 seems to be about
"Hypercalls exposed to privilege rings 1 and 2 of HVM guests"...
In any case, I see what you mean, and Juergen was also pointing out that
this is going to take a lot. I of course agree, but was, when
implementing, not sure about what the best solution was.
I initially thought of, as Juergen says, instrumenting page allocation
and adding the accounting there. However, I think that means doing
something very similar to having a MAX_NUMNODES big array for each
domain (or are there other ways?).
I think such information may turn out handy for other things in the
future but, for now, it'd be adding it for the sake of this hypercall
only... Is that acceptable? Any other ideas? Am I missing something?
> > + {
> > + node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
>
> Please don't open-code pfn_to_paddr()/page_to_maddr().
>
Ok. BTW, for this, I looked at what dump_numa() in xen/arch/x86/numa.c
does.
Should we fix that to --and I mean wrt both open coding, and also the
non-preemptability? Or it doesn't matter in this case, as it's just a
debug key handler?
> > + /* For nodes that are offline, don't touch the counter */
> > + if ( node <= max_node_index && node_online(node) )
>
> How can a node a running domain has memory on be offline?
> Looks like you want an assertion here instead.
>
Right. I did put this here to be on the safe side, and I agree that an
ASSERT() is much more correct and effective for that. Will do.
> > + memkb_on_node[node]++;
> > + }
> > + spin_unlock(&d->page_alloc_lock);
> > +
> > + for ( node = 0; node <= max_node_index; node++ )
> > + {
> > + memkb_on_node[node] <<= (PAGE_SHIFT-10);
> > + if ( copy_to_guest_offset(ni->memkb_on_node, node,
> > + &memkb_on_node[node], 1) )
> > + break;
>
> I'd suggest to do the copying in one go after the loop.
>
Also ok.
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus
> > xen_domctl_max_vcpus_t;
> > DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >
> >
> > +/* XEN_DOMCTL_numainfo */
> > +struct xen_domctl_numainfo {
> > + /*
> > + * IN: maximum addressable entry in the caller-provided arrays.
> > + * OUT: minimum between the maximum addressable entry in the
> > + * caller-provided arrays and largest online node identifier
> > + * in the system.
> > + */
> > + uint32_t max_node_index;
>
> With that IN/OUT specification (and the matching implementation
> above), how would the caller know it passed too small a value to
> fit all information?
>
It doesn't. I designed it to be similar to XEN_SYSCTL_numainfo, which
retrieves _system_wide_ NUMA information. Should I use a new, OUT only,
parameter for the required number of elements (or index of the largest),
so that the caller can compare the twos and figure things out?
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 1/4] xen: report how much memory a domain has on each NUMA node
2014-03-05 16:13 ` Dario Faggioli
@ 2014-03-05 16:44 ` Jan Beulich
0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2014-03-05 16:44 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
xen-devel, Daniel De Graaf
>>> On 05.03.14 at 17:13, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> On mer, 2014-03-05 at 15:04 +0000, Jan Beulich wrote:
>> >>> On 05.03.14 at 15:36, Dario Faggioli <dario.faggioli@citrix.com> wrote:
>> > + if ( !memkb_on_node )
>> > + break;
>> > +
>> > + spin_lock(&d->page_alloc_lock);
>> > + page_list_for_each(page, &d->page_list)
>>
>> No new non-preemptable potentially long running loops like this,
>> please. See XSA-77.
>>
> Not super familiar with XSAs, but do you perhaps 45 ("Several long
> latency operations are not preemptible",
> xenbits.xenproject.org/xsa/advisory-45.html)? 77 seems to be about
> "Hypercalls exposed to privilege rings 1 and 2 of HVM guests"...
XSA-77 is "Disaggregated domain management security status".
Is there anywhere this isn't shown that way?
> In any case, I see what you mean, and Juergen was also pointing out that
> this is going to take a lot. I of course agree, but was, when
> implementing, not sure about what the best solution was.
>
> I initially thought of, as Juergen says, instrumenting page allocation
> and adding the accounting there. However, I think that means doing
> something very similar to having a MAX_NUMNODES big array for each
> domain (or are there other ways?).
That's an option, and MAX_NUMNODES shouldn't be that large.
Another option is to check for preemption every so many
iterations. Your only problem then is where to store the
intermediate result.
>> > + {
>> > + node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
>>
>> Please don't open-code pfn_to_paddr()/page_to_maddr().
>>
> Ok. BTW, for this, I looked at what dump_numa() in xen/arch/x86/numa.c
> does.
>
> Should we fix that to --and I mean wrt both open coding, and also the
> non-preemptability? Or it doesn't matter in this case, as it's just a
> debug key handler?
Non-preemptability doesn't matter for that very reason. But
fixing open-coded constructs would certainly be nice.
>> > --- a/xen/include/public/domctl.h
>> > +++ b/xen/include/public/domctl.h
>> > @@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus
>> > xen_domctl_max_vcpus_t;
>> > DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
>> >
>> >
>> > +/* XEN_DOMCTL_numainfo */
>> > +struct xen_domctl_numainfo {
>> > + /*
>> > + * IN: maximum addressable entry in the caller-provided arrays.
>> > + * OUT: minimum between the maximum addressable entry in the
>> > + * caller-provided arrays and largest online node identifier
>> > + * in the system.
>> > + */
>> > + uint32_t max_node_index;
>>
>> With that IN/OUT specification (and the matching implementation
>> above), how would the caller know it passed too small a value to
>> fit all information?
>>
> It doesn't. I designed it to be similar to XEN_SYSCTL_numainfo, which
> retrieves _system_wide_ NUMA information. Should I use a new, OUT only,
> parameter for the required number of elements (or index of the largest),
> so that the caller can compare the twos and figure things out?
Either that, or simply always update max_node_index to the
needed number of elements (with the comment suitably updated)
- after all you can be sure the caller knows the number it passed in.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
2014-03-05 14:36 ` [PATCH 1/4] xen: " Dario Faggioli
@ 2014-03-05 14:36 ` Dario Faggioli
2014-03-05 15:05 ` Andrew Cooper
2014-03-10 16:39 ` Ian Jackson
2014-03-05 14:36 ` [PATCH 3/4] libxl: " Dario Faggioli
` (3 subsequent siblings)
5 siblings, 2 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 14:36 UTC (permalink / raw)
To: xen-devel
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
Jan Beulich, Daniel De Graaf
by means of a new interface: xc_domain_numainfo().
The caller is expected to allocate an array for the call to fill,
with the results of the XEN_DOMCTL_numainfo hypercall. The size of
the array is also passed to the function, which then returns back
the number of elements that have actually been filled by Xen.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
tools/libxc/xc_domain.c | 28 ++++++++++++++++++++++++++++
tools/libxc/xenctrl.h | 18 ++++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..a2b3c07 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -362,6 +362,34 @@ int xc_domain_getinfo(xc_interface *xch,
return nr_doms;
}
+int xc_domain_numainfo(xc_interface *xch, uint32_t domid,
+ int *max_nodes, uint64_t *memkbs)
+{
+ DECLARE_DOMCTL;
+ DECLARE_HYPERCALL_BOUNCE(memkbs, sizeof(uint64_t) * (*max_nodes),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+ int ret = 0;
+
+ if ( xc_hypercall_bounce_pre(xch, memkbs) )
+ {
+ PERROR("Could not allocate bounce buffer for DOMCTL_domain_numainfo");
+ return -1;
+ }
+
+ domctl.cmd = XEN_DOMCTL_numainfo;
+ domctl.domain = (domid_t)domid;
+ domctl.u.numainfo.max_node_index = *max_nodes - 1;
+ set_xen_guest_handle(domctl.u.numainfo.memkb_on_node, memkbs);
+
+ ret = do_domctl(xch, &domctl);
+
+ *max_nodes = domctl.u.numainfo.max_node_index + 1;
+
+ xc_hypercall_bounce_post(xch, memkbs);
+
+ return ret;
+}
+
int xc_domain_getinfolist(xc_interface *xch,
uint32_t first_domain,
unsigned int max_domains,
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 13f816b..845d183 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -657,6 +657,24 @@ int xc_domain_getinfolist(xc_interface *xch,
xc_domaininfo_t *info);
/**
+ * This function tells how much memory a domain has allocated on each
+ * online NUMA node of the host. The information is stored in an array
+ * that the caller provides, along with its size. The function updates
+ * the latter parameter with the number of elements in the array that
+ * have been actually filled.
+ *
+ * @param xch a handle to an open hypervisor interface
+ * @param domid the domain id for which we want the information
+ * @param max_nodes as an input, the size of the memkbs array; as an
+ * output, the number of filled elements in it
+ * @param memkbs an array with, in the i-eth element, the memory, in
+ * Kb, allocated for the domain on the i-eth NUMA node
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_numainfo(xc_interface *xch, uint32_t domid,
+ int *max_nodes, uint64_t *memkbs);
+
+/**
* This function set p2m for broken page
* &parm xch a handle to an open hypervisor interface
* @parm domid the domain id which broken page belong to
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 2/4] libxc: " Dario Faggioli
@ 2014-03-05 15:05 ` Andrew Cooper
2014-03-05 15:40 ` Dario Faggioli
2014-03-10 16:39 ` Ian Jackson
1 sibling, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2014-03-05 15:05 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Juergen Gross, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
On 05/03/14 14:36, Dario Faggioli wrote:
> by means of a new interface: xc_domain_numainfo().
>
> The caller is expected to allocate an array for the call to fill,
> with the results of the XEN_DOMCTL_numainfo hypercall. The size of
> the array is also passed to the function, which then returns back
> the number of elements that have actually been filled by Xen.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> tools/libxc/xc_domain.c | 28 ++++++++++++++++++++++++++++
> tools/libxc/xenctrl.h | 18 ++++++++++++++++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..a2b3c07 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -362,6 +362,34 @@ int xc_domain_getinfo(xc_interface *xch,
> return nr_doms;
> }
>
> +int xc_domain_numainfo(xc_interface *xch, uint32_t domid,
> + int *max_nodes, uint64_t *memkbs)
max_nodes is an unsigned quantity. libxc is quite fast and loose with
this, but lets not propage wrongness.
> +{
> + DECLARE_DOMCTL;
> + DECLARE_HYPERCALL_BOUNCE(memkbs, sizeof(uint64_t) * (*max_nodes),
> + XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> + int ret = 0;
Don't need the initialiser.
> +
> + if ( xc_hypercall_bounce_pre(xch, memkbs) )
> + {
> + PERROR("Could not allocate bounce buffer for DOMCTL_domain_numainfo");
> + return -1;
> + }
> +
> + domctl.cmd = XEN_DOMCTL_numainfo;
> + domctl.domain = (domid_t)domid;
> + domctl.u.numainfo.max_node_index = *max_nodes - 1;
> + set_xen_guest_handle(domctl.u.numainfo.memkb_on_node, memkbs);
> +
> + ret = do_domctl(xch, &domctl);
> +
> + *max_nodes = domctl.u.numainfo.max_node_index + 1;
If the domctl fails, this should not be written back to *max_nodes.
~Andrew
> +
> + xc_hypercall_bounce_post(xch, memkbs);
> +
> + return ret;
> +}
> +
> int xc_domain_getinfolist(xc_interface *xch,
> uint32_t first_domain,
> unsigned int max_domains,
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 13f816b..845d183 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -657,6 +657,24 @@ int xc_domain_getinfolist(xc_interface *xch,
> xc_domaininfo_t *info);
>
> /**
> + * This function tells how much memory a domain has allocated on each
> + * online NUMA node of the host. The information is stored in an array
> + * that the caller provides, along with its size. The function updates
> + * the latter parameter with the number of elements in the array that
> + * have been actually filled.
> + *
> + * @param xch a handle to an open hypervisor interface
> + * @param domid the domain id for which we want the information
> + * @param max_nodes as an input, the size of the memkbs array; as an
> + * output, the number of filled elements in it
> + * @param memkbs an array with, in the i-eth element, the memory, in
> + * Kb, allocated for the domain on the i-eth NUMA node
> + * @return 0 on success, -1 on failure
> + */
> +int xc_domain_numainfo(xc_interface *xch, uint32_t domid,
> + int *max_nodes, uint64_t *memkbs);
> +
> +/**
> * This function set p2m for broken page
> * &parm xch a handle to an open hypervisor interface
> * @parm domid the domain id which broken page belong to
>
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-05 15:05 ` Andrew Cooper
@ 2014-03-05 15:40 ` Dario Faggioli
0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 15:40 UTC (permalink / raw)
To: Andrew Cooper
Cc: Ian Campbell, Juergen Gross, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1383 bytes --]
On mer, 2014-03-05 at 15:05 +0000, Andrew Cooper wrote:
> On 05/03/14 14:36, Dario Faggioli wrote:
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..a2b3c07 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -362,6 +362,34 @@ int xc_domain_getinfo(xc_interface *xch,
> > return nr_doms;
> > }
> >
> > +int xc_domain_numainfo(xc_interface *xch, uint32_t domid,
> > + int *max_nodes, uint64_t *memkbs)
>
> max_nodes is an unsigned quantity. libxc is quite fast and loose with
> this, but lets not propage wrongness.
>
True. The bummer is that, as discussed already for other reasons, if I
want to use the value returned from libxl_get_max_nodes() as max_nodes,
that's an integer.
Anyway, I'll work around that, I guess.
> > +{
> > + DECLARE_DOMCTL;
> > + DECLARE_HYPERCALL_BOUNCE(memkbs, sizeof(uint64_t) * (*max_nodes),
> > + XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > + int ret = 0;
>
> Don't need the initialiser.
>
Ok.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 2/4] libxc: " Dario Faggioli
2014-03-05 15:05 ` Andrew Cooper
@ 2014-03-10 16:39 ` Ian Jackson
2014-03-10 17:07 ` Dario Faggioli
1 sibling, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-10 16:39 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("[PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> by means of a new interface: xc_domain_numainfo().
>
> The caller is expected to allocate an array for the call to fill,
> with the results of the XEN_DOMCTL_numainfo hypercall. The size of
> the array is also passed to the function, which then returns back
> the number of elements that have actually been filled by Xen.
Is there a need to get this data in a way which is coherent with the
domain info list memory usage data ?
What are callers supposed to do about discrepancies between the two
sets of information ?
(I forget: is the choice of node to allocate memory from up to the
guest, or the hose ? Is there any way to say a guest can use no more
than X1 amount on node 1 and no more than X2 on node 2 ?)
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-10 16:39 ` Ian Jackson
@ 2014-03-10 17:07 ` Dario Faggioli
2014-03-10 17:09 ` Andrew Cooper
2014-03-10 17:20 ` Ian Jackson
0 siblings, 2 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-10 17:07 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 2486 bytes --]
On lun, 2014-03-10 at 16:39 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> > by means of a new interface: xc_domain_numainfo().
> >
> > The caller is expected to allocate an array for the call to fill,
> > with the results of the XEN_DOMCTL_numainfo hypercall. The size of
> > the array is also passed to the function, which then returns back
> > the number of elements that have actually been filled by Xen.
>
> Is there a need to get this data in a way which is coherent with the
> domain info list memory usage data ?
>
You mean the output of `xl list' and `xl list -l'? I mean, these:
root@Zhaman:~# xl list 1
Name ID Mem VCPUs State Time(s)
vm-test 1 4096 16 -b---- 6.8
root@Zhaman:~# xl list -l 1
[
...
"b_info": {
...
"max_memkb": 4194304,
"target_memkb": 4194304,
"video_memkb": -1,
"shadow_memkb": 49152,
...
]
?
If yes, I think it is coherent, and if not, yes, it should be and that's
a bug... have you found any occurrences of such thing?
> What are callers supposed to do about discrepancies between the two
> sets of information ?
>
I'm sorry, what discrepancies?
root@Zhaman:~# xl numainfo 1
NODE Affinity: all
Memory:
Node 0: 2097152 Kb
Node 1: 2097152 Kb
root@Zhaman:~# bc -l
bc 1.06.95
2097152+2097152
4194304
Or was it something else you where talking about?
> (I forget: is the choice of node to allocate memory from up to the
> guest, or the hose ?
>
It's up to the toolstack. Then, something that can happen is that Xen
does not have enough memory on the specified nodes, but that's another
story.
> Is there any way to say a guest can use no more
> than X1 amount on node 1 and no more than X2 on node 2 ?)
>
There is no now, but that's a nice addition, and there will be soon, as,
for instance, Elena had it implemented already, for the sake of vNUMA
(but it can be used in non-vNUMA contexts too).
Did I answer your questions?
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-10 17:07 ` Dario Faggioli
@ 2014-03-10 17:09 ` Andrew Cooper
2014-03-10 17:20 ` Ian Jackson
1 sibling, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2014-03-10 17:09 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Juergen Gross, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 509 bytes --]
On 10/03/14 17:07, Dario Faggioli wrote:
> On lun, 2014-03-10 at 16:39 +0000, Ian Jackson wrote:
>> Dario Faggioli writes ("[PATCH 2/4] libxc: report how much memory a
domain has on each NUMA node"):
>>
>> What are callers supposed to do about discrepancies between the two
>> sets of information ?
>>
> I'm sorry, what discrepancies?
Without holding a big spinlock in Xen, the results from each hypercall
might not sum to the size of the machine. There is no acceptable
solution for this AFAICT.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 885 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-10 17:07 ` Dario Faggioli
2014-03-10 17:09 ` Andrew Cooper
@ 2014-03-10 17:20 ` Ian Jackson
2014-03-10 17:35 ` Dario Faggioli
1 sibling, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-10 17:20 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> On lun, 2014-03-10 at 16:39 +0000, Ian Jackson wrote:
> > Is there a need to get this data in a way which is coherent with the
> > domain info list memory usage data ?
> >
> You mean the output of `xl list' and `xl list -l'? I mean, these:
Yes.
> If yes, I think it is coherent, and if not, yes, it should be and that's
> a bug... have you found any occurrences of such thing?
I mean, what about races ?
> > What are callers supposed to do about discrepancies between the two
> > sets of information ?
> >
> I'm sorry, what discrepancies?
Discrepancies occuring due to the hypercalls being made at different
times.
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-10 17:20 ` Ian Jackson
@ 2014-03-10 17:35 ` Dario Faggioli
2014-03-11 11:15 ` Ian Jackson
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-10 17:35 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1281 bytes --]
On lun, 2014-03-10 at 17:20 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> > I'm sorry, what discrepancies?
>
> Discrepancies occuring due to the hypercalls being made at different
> times.
>
Right. I wasn't even thinking about this as, as Andrew said, I don't
think there is anything we can do about it!
However, my point above kind of was, if, at some point you issue the
XEN_DOMCTL_domain_numainfo hypercall, and then sum what you get in each
cell of the array, you have the total amount of memory which, *if*
nothing changed, is exactly what you got from an instance of the other
one, issued some time before (or later).
If something changed, then yes you're screwed, but the bottom line, I
think is, if you think you may need the information broken down in
nodes, use XEN_DOMCTL_domain_numainfo, which gives you that and,
indirectly, also the total size.
Does that make sense?
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-10 17:35 ` Dario Faggioli
@ 2014-03-11 11:15 ` Ian Jackson
2014-03-11 17:37 ` Dario Faggioli
0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-11 11:15 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> If something changed, then yes you're screwed, but the bottom line, I
> think is, if you think you may need the information broken down in
> nodes, use XEN_DOMCTL_domain_numainfo, which gives you that and,
> indirectly, also the total size.
But it gives you for only one domain; whereas the domain list gives
you all domains at once.
We have had problems before with toolstack memory balancing and
managemetn algorithms needing information which wasn't properly
available. So I think I want to see an explanation how this would be
used in an actual memory balancing/assignment algorith.
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-11 11:15 ` Ian Jackson
@ 2014-03-11 17:37 ` Dario Faggioli
2014-03-11 18:16 ` Ian Jackson
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-11 17:37 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1855 bytes --]
On mar, 2014-03-11 at 11:15 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> > If something changed, then yes you're screwed, but the bottom line, I
> > think is, if you think you may need the information broken down in
> > nodes, use XEN_DOMCTL_domain_numainfo, which gives you that and,
> > indirectly, also the total size.
>
> But it gives you for only one domain; whereas the domain list gives
> you all domains at once.
>
Right. Not sure what you're after (see below), but yes, it's probably at
least a more accurate snapshot of the system... I'll try to move this in
dominfo.
> We have had problems before with toolstack memory balancing and
> managemetn algorithms needing information which wasn't properly
> available. So I think I want to see an explanation how this would be
> used in an actual memory balancing/assignment algorith.
>
I'm not sure why you're asking this. For now, this is not going to be
used in any algorithm, it's pure information reporting.
Of course, the system administrator, or a toolstack at an higher level
than libxl, can use the info for making some decisions, but then full
consistency is up to then, I guess... mostly because I don't think there
is much we can do.
In the current NUMA placement implementation, in libxl, what we need to
know is how much memory is free on each node, rather than how much
memory each domain occupies there.
So, what is it that I should try to show you?
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-11 17:37 ` Dario Faggioli
@ 2014-03-11 18:16 ` Ian Jackson
2014-03-11 19:04 ` Dario Faggioli
0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-11 18:16 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> On mar, 2014-03-11 at 11:15 +0000, Ian Jackson wrote:
> > We have had problems before with toolstack memory balancing and
> > managemetn algorithms needing information which wasn't properly
> > available. So I think I want to see an explanation how this would be
> > used in an actual memory balancing/assignment algorith.
> >
> I'm not sure why you're asking this. For now, this is not going to be
> used in any algorithm, it's pure information reporting.
An API is for life, not just "for now".
> In the current NUMA placement implementation, in libxl, what we need to
> know is how much memory is free on each node, rather than how much
> memory each domain occupies there.
>
> So, what is it that I should try to show you?
I want a sketch of a race-free memory assignment approach that works
with ballooning (and domain self-directed numa migration), and doesn't
depend on arbitrating memory usage by having Xen fail certain memory
allocations.
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-11 18:16 ` Ian Jackson
@ 2014-03-11 19:04 ` Dario Faggioli
2014-03-13 11:54 ` George Dunlap
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-11 19:04 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1359 bytes --]
On mar, 2014-03-11 at 18:16 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
> > In the current NUMA placement implementation, in libxl, what we need to
> > know is how much memory is free on each node, rather than how much
> > memory each domain occupies there.
> >
> > So, what is it that I should try to show you?
>
> I want a sketch of a race-free memory assignment approach that works
> with ballooning (and domain self-directed numa migration), and doesn't
> depend on arbitrating memory usage by having Xen fail certain memory
> allocations.
>
Mmm... I think I see what you mean now (except perhaps for the "domain
self-directed numa migration" part).
I also still don't think that this specific call/functionality should be
subject to (that much) locking, at least not at this level, but I see
the value of having a plan.
That, however, is a way bigger thing than what we're discussing here.
I'll start a discussion in another thread.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node
2014-03-11 19:04 ` Dario Faggioli
@ 2014-03-13 11:54 ` George Dunlap
0 siblings, 0 replies; 33+ messages in thread
From: George Dunlap @ 2014-03-13 11:54 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
xen-devel, Jan Beulich, Daniel De Graaf
On Tue, Mar 11, 2014 at 7:04 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mar, 2014-03-11 at 18:16 +0000, Ian Jackson wrote:
>> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 2/4] libxc: report how much memory a domain has on each NUMA node"):
>> > In the current NUMA placement implementation, in libxl, what we need to
>> > know is how much memory is free on each node, rather than how much
>> > memory each domain occupies there.
>> >
>> > So, what is it that I should try to show you?
>>
>> I want a sketch of a race-free memory assignment approach that works
>> with ballooning (and domain self-directed numa migration), and doesn't
>> depend on arbitrating memory usage by having Xen fail certain memory
>> allocations.
>>
> Mmm... I think I see what you mean now (except perhaps for the "domain
> self-directed numa migration" part).
>
> I also still don't think that this specific call/functionality should be
> subject to (that much) locking, at least not at this level, but I see
> the value of having a plan.
I think it's worth taking a little bit of time seeing if we can make
an interface that could be used by a toolstack for such a purpose.
But if it's overly complicated, I don't see a problem with having one
interface just for information, and come up with a new race-free
interface once someone shows up that actually needs it. Otherwise we
risk spending a lot of time inventing an API that turns out not to be
suitable for purpose anyway.
-George
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 3/4] libxl: report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
2014-03-05 14:36 ` [PATCH 1/4] xen: " Dario Faggioli
2014-03-05 14:36 ` [PATCH 2/4] libxc: " Dario Faggioli
@ 2014-03-05 14:36 ` Dario Faggioli
2014-03-10 16:40 ` Ian Jackson
2014-03-05 14:36 ` [PATCH 4/4] xl: " Dario Faggioli
` (2 subsequent siblings)
5 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 14:36 UTC (permalink / raw)
To: xen-devel
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
Jan Beulich, Daniel De Graaf
by calling xc_domain_numainfo(). A new data type, libxl_domain_numainfo
is being introduced. For now it only holds how much memory a domain has
allocated on each NUMA node, but it can be useful for, in future,
reporting more per-domain NUMA related information.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
tools/libxl/libxl.c | 30 ++++++++++++++++++++++++++++++
tools/libxl/libxl.h | 2 ++
tools/libxl/libxl_types.idl | 4 ++++
3 files changed, 36 insertions(+)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 730f6e1..e5a1cb0 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4631,6 +4631,36 @@ int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
return 0;
}
+libxl_domain_numainfo *libxl_domain_get_numainfo(libxl_ctx *ctx,
+ uint32_t domid)
+{
+ GC_INIT(ctx);
+ int max_nodes;
+ libxl_domain_numainfo *ret = NULL;
+
+ max_nodes = libxl_get_max_nodes(ctx);
+ if (max_nodes < 0)
+ {
+ LOG(ERROR, "Unable to determine the number of NUMA nodes");
+ goto out;
+ }
+
+ ret = libxl__zalloc(NOGC, sizeof(libxl_domain_numainfo));
+ ret->memkbs = libxl__calloc(NOGC, max_nodes, sizeof(uint64_t));
+
+ if (xc_domain_numainfo(ctx->xch, domid, &max_nodes, ret->memkbs)) {
+ LOGE(ERROR, "getting domain NUMA info");
+ libxl_domain_numainfo_dispose(ret);
+ ret = NULL;
+ goto out;
+ }
+ ret->num_memkbs = max_nodes;
+
+ out:
+ GC_FREE;
+ return ret;
+}
+
static int libxl__set_vcpuonline_xenstore(libxl__gc *gc, uint32_t domid,
libxl_bitmap *cpumap)
{
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 06bbca6..f452752 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1067,6 +1067,8 @@ int libxl_domain_set_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
libxl_bitmap *nodemap);
int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid,
libxl_bitmap *nodemap);
+libxl_domain_numainfo *libxl_domain_get_numainfo(libxl_ctx *ctx,
+ uint32_t domid);
int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap);
libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..76158d7 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -515,6 +515,10 @@ libxl_vcpuinfo = Struct("vcpuinfo", [
("cpumap", libxl_bitmap), # current cpu's affinities
], dir=DIR_OUT)
+libxl_domain_numainfo = Struct("domain_numainfo", [
+ ("memkbs", Array(uint64, "num_memkbs")),
+ ], dir=DIR_OUT)
+
libxl_physinfo = Struct("physinfo", [
("threads_per_core", uint32),
("cores_per_socket", uint32),
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH 3/4] libxl: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 3/4] libxl: " Dario Faggioli
@ 2014-03-10 16:40 ` Ian Jackson
2014-03-10 17:28 ` Dario Faggioli
0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-10 16:40 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("[PATCH 3/4] libxl: report how much memory a domain has on each NUMA node"):
> by calling xc_domain_numainfo(). A new data type, libxl_domain_numainfo
> is being introduced. For now it only holds how much memory a domain has
> allocated on each NUMA node, but it can be useful for, in future,
> reporting more per-domain NUMA related information.
Is there some reason this shouldn't be in the normal domain info
struct ?
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 3/4] libxl: report how much memory a domain has on each NUMA node
2014-03-10 16:40 ` Ian Jackson
@ 2014-03-10 17:28 ` Dario Faggioli
2014-03-13 17:26 ` Ian Jackson
0 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-10 17:28 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 2432 bytes --]
On lun, 2014-03-10 at 16:40 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 3/4] libxl: report how much memory a domain has on each NUMA node"):
> > by calling xc_domain_numainfo(). A new data type, libxl_domain_numainfo
> > is being introduced. For now it only holds how much memory a domain has
> > allocated on each NUMA node, but it can be useful for, in future,
> > reporting more per-domain NUMA related information.
>
> Is there some reason this shouldn't be in the normal domain info
> struct ?
>
Nothing other than personal taste. For consistency with getting/setting
node affinity, I introduced a call to retrieve this, and specifically.
Having done that, I decided to use an independent struct also.
There is no harm in moving it somewhere else, if you like that better.
Just to be sure, you mean putting it in here:
libxl_dominfo = Struct("dominfo",[
("uuid", libxl_uuid),
("domid", libxl_domid),
("ssidref", uint32),
("running", bool),
("blocked", bool),
("paused", bool),
("shutdown", bool),
("dying", bool),
# Valid iff (shutdown||dying).
#
# Otherwise set to a value guaranteed not to clash with any valid
# LIBXL_SHUTDOWN_REASON_* constant.
("shutdown_reason", libxl_shutdown_reason),
("outstanding_memkb", MemKB),
("current_memkb", MemKB),
("shared_memkb", MemKB),
("paged_memkb", MemKB),
("max_memkb", MemKB),
("cpu_time", uint64),
("vcpu_max_id", uint32),
("vcpu_online", uint32),
("cpupool", uint32),
("domain_type", libxl_domain_type),
], dir=DIR_OUT)
and retrieving by calling libxl_list_domain, right?
In which case, I don't think it will be that easy to have a function
that retrieves specifically this information (and whatever else we
could be wanting to stash in a libxl_domain_numainfo, type... Do you see
any issue in dropping it? (The problem, of course, won't be the
function, but what to return from it, if I decide not to introduce an
ad-hoc type).
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 3/4] libxl: report how much memory a domain has on each NUMA node
2014-03-10 17:28 ` Dario Faggioli
@ 2014-03-13 17:26 ` Ian Jackson
0 siblings, 0 replies; 33+ messages in thread
From: Ian Jackson @ 2014-03-13 17:26 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 3/4] libxl: report how much memory a domain has on each NUMA node"):
> On lun, 2014-03-10 at 16:40 +0000, Ian Jackson wrote:
> > Is there some reason this shouldn't be in the normal domain info
> > struct ?
>
> Nothing other than personal taste. For consistency with getting/setting
> node affinity, I introduced a call to retrieve this, and specifically.
> Having done that, I decided to use an independent struct also.
I see. Is there a performance reason to have it in a separate struct,
or a race coherency reason to have it in the same one ?
> There is no harm in moving it somewhere else, if you like that better.
I want to know that there's a reason :-). (Or at least, that there is
no reason not to do it this way - which implies having tried to think
of pros and cons).
> Just to be sure, you mean putting it in here:
> libxl_dominfo = Struct("dominfo",[
> and retrieving by calling libxl_list_domain, right?
Right.
> In which case, I don't think it will be that easy to have a function
> that retrieves specifically this information (and whatever else we
> could be wanting to stash in a libxl_domain_numainfo, type... Do you see
> any issue in dropping it? (The problem, of course, won't be the
> function, but what to return from it, if I decide not to introduce an
> ad-hoc type).
I'm afraid I don't understand this paragraph at all. Can you make it
less abstract ? I'm getting confused by all the "this" and "that"s, I
think.
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 4/4] xl: report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
` (2 preceding siblings ...)
2014-03-05 14:36 ` [PATCH 3/4] libxl: " Dario Faggioli
@ 2014-03-05 14:36 ` Dario Faggioli
2014-03-10 16:42 ` Ian Jackson
2014-03-05 14:40 ` [PATCH 0/4] " Juergen Gross
2014-03-10 16:37 ` Ian Jackson
5 siblings, 1 reply; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 14:36 UTC (permalink / raw)
To: xen-devel
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, Ian Jackson,
Jan Beulich, Daniel De Graaf
by introducing a new subcommant: `xl numainfo <domain>'.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
tools/libxl/xl.h | 1 +
tools/libxl/xl_cmdimpl.c | 58 +++++++++++++++++++++++++++++++++++++++++++++
tools/libxl/xl_cmdtable.c | 5 ++++
3 files changed, 64 insertions(+)
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index f188708..f519242 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -33,6 +33,7 @@ struct cmd_spec {
int main_vcpulist(int argc, char **argv);
int main_info(int argc, char **argv);
int main_sharing(int argc, char **argv);
+int main_numainfo(int argc, char **argv);
int main_cd_eject(int argc, char **argv);
int main_cd_insert(int argc, char **argv);
int main_console(int argc, char **argv);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4fc46eb..07a3504 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4721,6 +4721,64 @@ int main_vcpupin(int argc, char **argv)
return vcpupin(find_domain(argv[optind]), argv[optind+1] , argv[optind+2]);
}
+static void print_domain_numainfo(uint32_t domid)
+{
+ libxl_domain_numainfo *info;
+ libxl_bitmap nodemap;
+ libxl_physinfo physinfo;
+ int i;
+
+ libxl_bitmap_init(&nodemap);
+ libxl_physinfo_init(&physinfo);
+
+ if (libxl_node_bitmap_alloc(ctx, &nodemap, 0)) {
+ fprintf(stderr, "libxl_node_bitmap_alloc_failed.\n");
+ goto out;
+ }
+ if (libxl_get_physinfo(ctx, &physinfo) != 0) {
+ fprintf(stderr, "libxl_physinfo failed.\n");
+ goto out;
+ }
+
+ if (libxl_domain_get_nodeaffinity(ctx, domid, &nodemap)) {
+ fprintf(stderr, "libxl_domain_get_nodeaffinity failed.\n");
+ goto out;
+ }
+ printf("NODE Affinity: ");
+ print_bitmap(nodemap.map, physinfo.nr_nodes, stdout);
+ printf("\n");
+
+ info = libxl_domain_get_numainfo(ctx, domid);
+ if (!info) {
+ fprintf(stderr, "libxl_domain_get_numainfo failed.\n");
+ goto out;
+ }
+ printf("Memory:\n");
+ for (i = 0; i < info->num_memkbs; i++) {
+ if (info->memkbs[i])
+ printf(" Node %d: %"PRIu64" Kb\n", i, info->memkbs[i]);
+ }
+
+ out:
+ libxl_bitmap_dispose(&nodemap);
+ libxl_physinfo_dispose(&physinfo);
+}
+
+int main_numainfo(int argc, char **argv)
+{
+ uint32_t domid;
+ int opt = 0;
+
+ SWITCH_FOREACH_OPT(opt, "", NULL, "numainfo", 1) {
+ /* No options */
+ }
+
+ domid = find_domain(argv[optind]);
+ print_domain_numainfo(domid);
+
+ return 0;
+}
+
static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
{
char *endptr;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index e8ab93a..4b040dd 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -201,6 +201,11 @@ struct cmd_spec cmd_table[] = {
"Set the current memory usage for a domain",
"<Domain> <MemMB['b'[bytes]|'k'[KB]|'m'[MB]|'g'[GB]|'t'[TB]]>",
},
+ { "numainfo",
+ &main_numainfo, 0, 0,
+ "Print NUMA related information for a domain",
+ "<Domain>",
+ },
{ "button-press",
&main_button_press, 0, 1,
"Indicate an ACPI button press to the domain",
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH 4/4] xl: report how much memory a domain has on each NUMA node
2014-03-05 14:36 ` [PATCH 4/4] xl: " Dario Faggioli
@ 2014-03-10 16:42 ` Ian Jackson
2014-03-10 17:09 ` Dario Faggioli
0 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-10 16:42 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("[PATCH 4/4] xl: report how much memory a domain has on each NUMA node"):
> by introducing a new subcommant: `xl numainfo <domain>'.
There needs to be a machine-parseable (JSON) version of this
information. We don't want people to start parsing this new xl
output, or we'll find it hard to extend.
Maybe we should have a "xl dominfo" command which prints this and
other information.
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 4/4] xl: report how much memory a domain has on each NUMA node
2014-03-10 16:42 ` Ian Jackson
@ 2014-03-10 17:09 ` Dario Faggioli
0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-10 17:09 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1335 bytes --]
On lun, 2014-03-10 at 16:42 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 4/4] xl: report how much memory a domain has on each NUMA node"):
> > by introducing a new subcommant: `xl numainfo <domain>'.
>
> There needs to be a machine-parseable (JSON) version of this
> information. We don't want people to start parsing this new xl
> output, or we'll find it hard to extend.
>
Ok, this I can look into.
> Maybe we should have a "xl dominfo" command which prints this and
> other information.
>
Yeah, perhaps we should. I was looking at how to integrate this in
something already existing. The only one I found is `xl list', but
that's something impossible to fit in there nicely (imagine a domain
with memory on 8 nodes! :-O)
root@Zhaman:~# xl list 1
Name ID Mem VCPUs State Time(s)
vm-test 1 4096 16 -b---- 7.1
So, yes, I can look into this to... What kind of other information where
you thinking about?
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/4] report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
` (3 preceding siblings ...)
2014-03-05 14:36 ` [PATCH 4/4] xl: " Dario Faggioli
@ 2014-03-05 14:40 ` Juergen Gross
2014-03-05 14:44 ` Dario Faggioli
2014-03-10 16:37 ` Ian Jackson
5 siblings, 1 reply; 33+ messages in thread
From: Juergen Gross @ 2014-03-05 14:40 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
On 05.03.2014 15:36, Dario Faggioli wrote:
> Going all the way from an hypercall to the `xl' sub-command
> (via libxl and libxc interfaces, of course).
>
> For the following domains:
>
> root@Zhaman:~# xl list -n
> Name ID Mem VCPUs State Time(s) NODE Affinity
> Domain-0 0 511 16 r----- 29.3 all
> vm-test 1 1024 2 -b---- 6.6 0
>
> The output looks like this:
>
> root@Zhaman:~# xl numainfo 0
> NODE Affinity: all
> Memory:
> Node 0: 251208 Kb
> Node 1: 272820 Kb
>
> root@Zhaman:~# xl numainfo 1
> NODE Affinity: 0
> Memory:
> Node 0: 1048576 Kb
>
> Regards,
> Dario
>
> ---
>
> Dario Faggioli (4):
> xen: report how much memory a domain has on each NUMA node
> libxc: report how much memory a domain has on each NUMA node
> libxl: report how much memory a domain has on each NUMA node
> xl: report how much memory a domain has on each NUMA node
>
>
> tools/libxc/xc_domain.c | 28 +++++++++++++++++
> tools/libxc/xenctrl.h | 18 +++++++++++
> tools/libxl/libxl.c | 30 ++++++++++++++++++
> tools/libxl/libxl.h | 2 +
> tools/libxl/libxl_types.idl | 4 ++
> tools/libxl/xl.h | 1 +
> tools/libxl/xl_cmdimpl.c | 58 +++++++++++++++++++++++++++++++++++
> tools/libxl/xl_cmdtable.c | 5 +++
> xen/common/domctl.c | 45 +++++++++++++++++++++++++++
> xen/include/public/domctl.h | 22 +++++++++++++
> xen/xsm/flask/hooks.c | 3 ++
> xen/xsm/flask/policy/access_vectors | 2 +
> 12 files changed, 218 insertions(+)
xl man page?
Juergen
--
Juergen Gross Principal Developer Operating Systems
PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932
Fujitsu e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 0/4] report how much memory a domain has on each NUMA node
2014-03-05 14:40 ` [PATCH 0/4] " Juergen Gross
@ 2014-03-05 14:44 ` Dario Faggioli
0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-05 14:44 UTC (permalink / raw)
To: Juergen Gross
Cc: Ian Campbell, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1409 bytes --]
On mer, 2014-03-05 at 15:40 +0100, Juergen Gross wrote:
> On 05.03.2014 15:36, Dario Faggioli wrote:
> > tools/libxc/xc_domain.c | 28 +++++++++++++++++
> > tools/libxc/xenctrl.h | 18 +++++++++++
> > tools/libxl/libxl.c | 30 ++++++++++++++++++
> > tools/libxl/libxl.h | 2 +
> > tools/libxl/libxl_types.idl | 4 ++
> > tools/libxl/xl.h | 1 +
> > tools/libxl/xl_cmdimpl.c | 58 +++++++++++++++++++++++++++++++++++
> > tools/libxl/xl_cmdtable.c | 5 +++
> > xen/common/domctl.c | 45 +++++++++++++++++++++++++++
> > xen/include/public/domctl.h | 22 +++++++++++++
> > xen/xsm/flask/hooks.c | 3 ++
> > xen/xsm/flask/policy/access_vectors | 2 +
> > 12 files changed, 218 insertions(+)
>
> xl man page?
>
Aha, right! I _knew_ I was missing something... I honestly had that
feeling but could not figure out what precisely was until now that you
mention it! :-P
Sorry for this, I'll include it in v2.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/4] report how much memory a domain has on each NUMA node
2014-03-05 14:36 [PATCH 0/4] report how much memory a domain has on each NUMA node Dario Faggioli
` (4 preceding siblings ...)
2014-03-05 14:40 ` [PATCH 0/4] " Juergen Gross
@ 2014-03-10 16:37 ` Ian Jackson
2014-03-10 17:12 ` Dario Faggioli
5 siblings, 1 reply; 33+ messages in thread
From: Ian Jackson @ 2014-03-10 16:37 UTC (permalink / raw)
To: Dario Faggioli
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
Dario Faggioli writes ("[PATCH 0/4] report how much memory a domain has on each NUMA node"):
> Going all the way from an hypercall to the `xl' sub-command
> (via libxl and libxc interfaces, of course).
>
> For the following domains:
>
> root@Zhaman:~# xl list -n
I don't think we should use "-n" for this. Can we use "--numa" ?
Thanks,
Ian.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 0/4] report how much memory a domain has on each NUMA node
2014-03-10 16:37 ` Ian Jackson
@ 2014-03-10 17:12 ` Dario Faggioli
0 siblings, 0 replies; 33+ messages in thread
From: Dario Faggioli @ 2014-03-10 17:12 UTC (permalink / raw)
To: Ian Jackson
Cc: Ian Campbell, Andrew Cooper, Juergen Gross, xen-devel,
Jan Beulich, Daniel De Graaf
[-- Attachment #1.1: Type: text/plain, Size: 1479 bytes --]
On lun, 2014-03-10 at 16:37 +0000, Ian Jackson wrote:
> Dario Faggioli writes ("[PATCH 0/4] report how much memory a domain has on each NUMA node"):
> > Going all the way from an hypercall to the `xl' sub-command
> > (via libxl and libxc interfaces, of course).
> >
> > For the following domains:
> >
> > root@Zhaman:~# xl list -n
>
> I don't think we should use "-n" for this. Can we use "--numa" ?
>
this is something already there. It shows the NUMA affinity for the
domain. E.g., _without_ any patches:
root@Zhaman:~# xl list -n
Name ID Mem VCPUs State Time(s) NODE Affinity
Domain-0 0 4095 16 r----- 34.1 all
vm-test 1 4096 16 -b---- 7.3 all
root@Zhaman:~# xl list -h
Usage: xl [-v] list [options] [Domain]
List information about all/some domains.
Options:
-l, --long Output all VM details
-v, --verbose Prints out UUIDs and security context
-Z, --context Prints out security context
-n, --numa Prints out NUMA node affinity
So, actually, we already use both -n and --numa
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread