Xen NUMA memory allocation policy

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Xen NUMA memory allocation policy
@ 2013-12-17 19:41 Saurabh Mishra
  2013-12-18  1:38 ` Dario Faggioli
  0 siblings, 1 reply; 4+ messages in thread
From: Saurabh Mishra @ 2013-12-17 19:41 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 1617 bytes --]

Hi --

We are using Xen 4.2.2_06 on SLES SP3 Updates and wanted to know if there
is a simple way to gather information about physical pages allocated for a
HVM guest. We are trying to figure whether XL is better off in
allocating contiguous huge/large pages for a guest or XM. I guess it does
not matter since Xen's hypervisor would be implementing page allocation
polices.

With xl debug-key u, we know how much memory was allocated from each NUMA
node, but we would also like to know whether how much of them were huge
pages and were they contiguous or not. Basically we need to retrieve
machine pfn and VM's pfn to do some comparison.

(XEN) Memory location of each domain:
(XEN) Domain 0 (total: 603765):
(XEN)     Node 0: 363652
(XEN)     Node 1: 240113
(XEN) Domain 1 (total: 2096119):
(XEN)     Node 0: 1047804
(XEN)     Node 1: 1048315
(XEN) Domain 2 (total: 25164798):
(XEN)     Node 0: 12582143
(XEN)     Node 1: 12582655

We would like Xen to allocate as many as continuous huge/large pages for a
HVM guest. So if there is a tunable, then please let me know. The HW is
x86-64.

Looks like 'xl debug-key D' can tell us gfn and mfn. What does 'order' and
'is_pod' mean in the output? How can we ensure pages are contiguous?

(XEN) gfn: 180ee00          mfn: 1103800           order:  9  is_pod: 0
(XEN) gfn: 180f000           mfn: 8ad400            order:  9  is_pod: 0
(XEN) gfn: 180f200           mfn: 1103600           order:  9  is_pod: 0
(XEN) gfn: 180f400           mfn: 8ad200            order:  9  is_pod: 0
(XEN) gfn: 180f600           mfn: 1103400           order:  9  is_pod: 0

Thanks,
/Saurabh

[-- Attachment #1.2: Type: text/html, Size: 2966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen NUMA memory allocation policy
  2013-12-17 19:41 Xen NUMA memory allocation policy Saurabh Mishra
@ 2013-12-18  1:38 ` Dario Faggioli
  2013-12-19 23:48   ` Saurabh Mishra
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2013-12-18  1:38 UTC (permalink / raw)
  To: Saurabh Mishra; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3650 bytes --]

On mar, 2013-12-17 at 11:41 -0800, Saurabh Mishra wrote:
> Hi --
>
Hi,

> We are using Xen 4.2.2_06 on SLES SP3 Updates and wanted to know if
> there is a simple way to gather information about physical pages
> allocated for a HVM guest. 
>
In general, no, here are no simple ways to retrieve such information.
Actually, putting something together that would allow one to get much
more info on the memory layout of a guest (wrt NUMA) is something that
is on my TODO list for quite some time, but I haven't got there yet...
I'll get to there eventually, and any help is appreciated! :-)

> We are trying to figure whether XL is better off in allocating
> contiguous huge/large pages for a guest or XM. I guess it does not
> matter since Xen's hypervisor would be implementing page allocation
> polices.
> 
Indeed. What changes between xl and xm/xend, is whether and how they
build up a vcpu-to-pcpu pinning mask, when the domain is created. In
fact, as of now, that is all that matters, as far as allocating pages on
nodes (happening in the hypervisor) is concerned.

In both cases, if you specify a vcpu-to-pcpu pinning mask in the domain
config file, that is passed directly to the hypervisor, which would then
allocate memory striping the pages on the NUMA nodes to which the pcpus
in the mask belong.

Also, in case no pinning is specified in the config file, both toolstack
tries to come up with a best possible placement of the new guest on the
host NUMA nodes, and build up a suitable vcpu-to-pcpu pinning mask, pass
it to the hypervisor, and... See above. :-)

What differs between xl and xm is the algorithm used to come up with
such automatic placement (i.e., both algorithms are based on some
heuristics, but those heuristics are different). I'd say that the xl's
algorithm is better, but that's a very biased opinion, as I'm the one
who wrote it! :-P
However, since xl is the default toolstack, while xm is already
deprecated and won't be even built by default very soon, I'm definitely
saying, try xl, and, if there is anything that doesn't work or seems
wrong, please report it here (putting me in Cc).

Hope this clarifies things a bit for you...

> With xl debug-key u, we know how much memory was allocated from each
> NUMA node, but we would also like to know whether how much of them
> were huge pages and were they contiguous or not. 
>
I'm not aware of any tool giving this sort of information.

> Basically we need to retrieve machine pfn and VM's pfn to do some
> comparison.
> 
Well, at some point, for debugging an understanding purposes, I wrote
something called xen-mfnup, which is in tree:
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ae763e4224304983a1cde2fbb3d6e0c4d60b2688

It does allow you to get some insights about pfn-s and mfn-s, but not as
much as you need, I'm afraid (not to mention that I did it mostly with
PV guests in mind, and tested mostly on them).

> (XEN) Memory location of each domain:
> (XEN) Domain 0 (total: 603765):
> (XEN)     Node 0: 363652
> (XEN)     Node 1: 240113
> (XEN) Domain 1 (total: 2096119):
> (XEN)     Node 0: 1047804
> (XEN)     Node 1: 1048315
> (XEN) Domain 2 (total: 25164798):
> (XEN)     Node 0: 12582143
> (XEN)     Node 1: 12582655
> 
> 
Mmm... BTW, if I can ask, what's the config file for these domains?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen NUMA memory allocation policy
  2013-12-18  1:38 ` Dario Faggioli
@ 2013-12-19 23:48   ` Saurabh Mishra
  2013-12-20 13:52     ` Dario Faggioli
  0 siblings, 1 reply; 4+ messages in thread
From: Saurabh Mishra @ 2013-12-19 23:48 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5148 bytes --]

Hi Dario --

>Mmm... BTW, if I can ask, what's the config file for these domains?
>Regards,
>Dario

Here's the config file :-

name = "hvm-vm6"
boot = "c"
memory = 98304
vcpus = 32
disk = [ 'file:<image,hda,w' ]
vif = [ 'model=e1000, mac=06:00:00:00:00:00, bridge=br0', 'model=e1000,
mac=06:00:01:00:00:00, bridge=br1' ]
pci = [  '0000:07:10.0=0@0a', '0000:07:10.2=0@0b', '0000:07:10.4=0@0c',
'0000:07:10.6=0@0d', '0000:07:11.0=0@0e', '0000:07:11.2=0@0f',
'0000:07:10.1=0@10', '0000:07:10.3=0@11', '0000:07:10.5=0@12',
'0000:07:10.7=0@13', '0000:07:11.1=0@14', '0000:88:10.0=0@15',
'0000:88:10.2=0@16', '0000:88:10.4=0@17
', '0000:88:10.6=0@18', '0000:88:11.0=0@19', '0000:88:11.2=0@1a',
'0000:88:10.1=0@1b', '0000:88:10.3=0@1c', '0000:88:10.5=0@1d',
'0000:88:10.7=0@1e', '0000:88:11.1=0@1f' ]
cpus = [  '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15',
'16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27',
'28', '29', '30', '31', '32', '33', '34', '35' ]

# HVM specific
kernel = "hvmloader"
builder = "hvm"
device_model = "qemu-dm"

# Enable ACPI support
acpi = 1

# Enable serial console
serial = "pty"

# Enable VNC
vnc = 1
vnclisten = "0.0.0.0"

pci_msitranslate = 0

# Default behavior for following events
on_reboot = "destroy"

# Disable Xen Platform PCI device
xen_platform_pci=0


Thanks,
/Saurabh

On Tue, Dec 17, 2013 at 5:38 PM, Dario Faggioli
<dario.faggioli@citrix.com>wrote:

> On mar, 2013-12-17 at 11:41 -0800, Saurabh Mishra wrote:
> > Hi --
> >
> Hi,
>
> > We are using Xen 4.2.2_06 on SLES SP3 Updates and wanted to know if
> > there is a simple way to gather information about physical pages
> > allocated for a HVM guest.
> >
> In general, no, here are no simple ways to retrieve such information.
> Actually, putting something together that would allow one to get much
> more info on the memory layout of a guest (wrt NUMA) is something that
> is on my TODO list for quite some time, but I haven't got there yet...
> I'll get to there eventually, and any help is appreciated! :-)
>
> > We are trying to figure whether XL is better off in allocating
> > contiguous huge/large pages for a guest or XM. I guess it does not
> > matter since Xen's hypervisor would be implementing page allocation
> > polices.
> >
> Indeed. What changes between xl and xm/xend, is whether and how they
> build up a vcpu-to-pcpu pinning mask, when the domain is created. In
> fact, as of now, that is all that matters, as far as allocating pages on
> nodes (happening in the hypervisor) is concerned.
>
> In both cases, if you specify a vcpu-to-pcpu pinning mask in the domain
> config file, that is passed directly to the hypervisor, which would then
> allocate memory striping the pages on the NUMA nodes to which the pcpus
> in the mask belong.
>
> Also, in case no pinning is specified in the config file, both toolstack
> tries to come up with a best possible placement of the new guest on the
> host NUMA nodes, and build up a suitable vcpu-to-pcpu pinning mask, pass
> it to the hypervisor, and... See above. :-)
>
> What differs between xl and xm is the algorithm used to come up with
> such automatic placement (i.e., both algorithms are based on some
> heuristics, but those heuristics are different). I'd say that the xl's
> algorithm is better, but that's a very biased opinion, as I'm the one
> who wrote it! :-P
> However, since xl is the default toolstack, while xm is already
> deprecated and won't be even built by default very soon, I'm definitely
> saying, try xl, and, if there is anything that doesn't work or seems
> wrong, please report it here (putting me in Cc).
>
> Hope this clarifies things a bit for you...
>
> > With xl debug-key u, we know how much memory was allocated from each
> > NUMA node, but we would also like to know whether how much of them
> > were huge pages and were they contiguous or not.
> >
> I'm not aware of any tool giving this sort of information.
>
> > Basically we need to retrieve machine pfn and VM's pfn to do some
> > comparison.
> >
> Well, at some point, for debugging an understanding purposes, I wrote
> something called xen-mfnup, which is in tree:
>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=ae763e4224304983a1cde2fbb3d6e0c4d60b2688
>
> It does allow you to get some insights about pfn-s and mfn-s, but not as
> much as you need, I'm afraid (not to mention that I did it mostly with
> PV guests in mind, and tested mostly on them).
>
> > (XEN) Memory location of each domain:
> > (XEN) Domain 0 (total: 603765):
> > (XEN)     Node 0: 363652
> > (XEN)     Node 1: 240113
> > (XEN) Domain 1 (total: 2096119):
> > (XEN)     Node 0: 1047804
> > (XEN)     Node 1: 1048315
> > (XEN) Domain 2 (total: 25164798):
> > (XEN)     Node 0: 12582143
> > (XEN)     Node 1: 12582655
> >
> >
> Mmm... BTW, if I can ask, what's the config file for these domains?
>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>
>

[-- Attachment #1.2: Type: text/html, Size: 7370 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Xen NUMA memory allocation policy
  2013-12-19 23:48   ` Saurabh Mishra
@ 2013-12-20 13:52     ` Dario Faggioli
  0 siblings, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2013-12-20 13:52 UTC (permalink / raw)
  To: Saurabh Mishra; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1727 bytes --]

On gio, 2013-12-19 at 15:48 -0800, Saurabh Mishra wrote:
> Hi Dario --
> 
Hi,

> >Mmm... BTW, if I can ask, what's the config file for these domains?
>
> Here's the config file :-
> 
> 
> name = "hvm-vm6"
> boot = "c"
> memory = 98304
> vcpus = 32
> disk = [ 'file:<image,hda,w' ]
> vif = [ 'model=e1000, mac=06:00:00:00:00:00, bridge=br0',
> 'model=e1000, mac=06:00:01:00:00:00, bridge=br1' ]
> pci = [  '0000:07:10.0=0@0a', '0000:07:10.2=0@0b',
> '0000:07:10.4=0@0c', '0000:07:10.6=0@0d', '0000:07:11.0=0@0e',
> '0000:07:11.2=0@0f', '0000:07:10.1=0@10', '0000:07:10.3=0@11',
> '0000:07:10.5=0@12', '0000:07:10.7=0@13', '0000:07:11.1=0@14',
> '0000:88:10.0=0@15', '0000:88:10.2=0@16', '0000:88:10.4=0@17
> ', '0000:88:10.6=0@18', '0000:88:11.0=0@19', '0000:88:11.2=0@1a',
> '0000:88:10.1=0@1b', '0000:88:10.3=0@1c', '0000:88:10.5=0@1d',
> '0000:88:10.7=0@1e', '0000:88:11.1=0@1f' ]
> cpus = [  '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14',
> '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25',
> '26', '27', '28', '29', '30', '31', '32', '33', '34', '35' ]
> 
Ok. The reason why I asked was that I was seeing you the memory for the
domain being allocated from multiple NUMA node, which is what happens by
default, so I was wondering whether there could be preblems somewhere...

However, now I see that that's because of this line above, pinning cpus
at creation time.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-12-20 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-17 19:41 Xen NUMA memory allocation policy Saurabh Mishra
2013-12-18  1:38 ` Dario Faggioli
2013-12-19 23:48   ` Saurabh Mishra
2013-12-20 13:52     ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).