From: Dario Faggioli <dario.faggioli@citrix.com>
To: Jim Fehlig <jfehlig@suse.com>
Cc: Jan Beulich <JBeulich@suse.com>, xen-devel <xen-devel@lists.xen.org>
Subject: Re: [libvirt] [PATCH 1/4] libxl: implement NUMA capabilities reporting
Date: Thu, 4 Jul 2013 18:53:12 +0200 [thread overview]
Message-ID: <1372956792.10336.93.camel@Solace> (raw)
In-Reply-To: <51D206FB.8000908@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 4238 bytes --]
[Moving the conversation on @xen-devel and adding Jan, as that seems
more appropriate]
[Jan, this came up as I'm implementing some NUMA bits in libvirt but, as
you see, the core of Jim's question is purely about Xen]
On lun, 2013-07-01 at 16:47 -0600, Jim Fehlig wrote:
> On my non-NUMA test machine I have the cell memory reported as
>
> <memory unit='KiB'>9175040</memory>
>
Which is 8960, if divided by 1024, so at least it's consistent.
However...
> The machine has 8G of memory, running xen 4.3 rc6, with dom0_mem=1024M. 'xl
> info --numa' says
>
> total_memory : 8190
> ...
> numa_info :
> node: memsize memfree distances
> 0: 8960 7116 10
>
> Why is the node memsize > total_memory?
Mmm... Interesting question. I really never paid attention to this...
Jan (or anyone else), is that something known and/or expected?
I went checking this down in Xen, and here's what I found.
total_memory is: info.total_pages/((1 << 20) / vinfo->pagesize)
where 'info' is what libxl_get_physinfo() provides. On its turn,
libxl_get_physinfo() is xc_physinfo(), which is XEN_SYSCTL_physinfo,
which uses total_pages, which is assigned the number of pages, down in
__start_xen(), as it results from parsing the E820 map (looking for RAM
blocks).
OTOH, memsize comes from libxl_get_numainfo(), which is xc_numainfo(),
which is XEN_SYSCTL_numainfo, which puts in memsize what
node_spanned_pages(<node_id>) says.
That seems to come, on a NUMA box, from the parsing of SRAT, and on a
non-NUMA box, from just (start_pfn-end_pfn) (in pages, of course).
Anyway, on my NUMA box, I see something similar to what Jim sees on a
non-NUMA one:
# xl info -n
...
total_memory : 12285
...
numa_info :
node: memsize memfree distances
0: 6144 23 10,20
1: 6720 104 20,10
Where 6144+6720=12864 > 12285
Looking at what Xen says during boot, I see this (the [*], [+], [=] and
[|] are mine):
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 0000000000096000 (usable)
(XEN) 00000000000f0000 - 0000000000100000 (reserved) [*]
(XEN) 0000000000100000 - 00000000dbdf9c00 (usable)
(XEN) 00000000dbdf9c00 - 00000000dbe4bc00 (ACPI NVS) [+]
(XEN) 00000000dbe4bc00 - 00000000dbe4dc00 (ACPI data) [=]
(XEN) 00000000dbe4dc00 - 00000000dc000000 (reserved) [|]
(XEN) 00000000f8000000 - 00000000fd000000 (reserved)
(XEN) 00000000fe000000 - 00000000fed00400 (reserved)
(XEN) 00000000fee00000 - 00000000fef00000 (reserved)
(XEN) 00000000ffb00000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000324000000 (usable)
...
(XEN) System RAM: 12285MB (12580412kB)
And my math says that 12285MB is the sum of the areas marked as
(usable), i.e., I guess, what during parsing is recognised as
E820_RAM... which makes total sense.
A bit below that I have this:
(XEN) SRAT: Node 1 PXM 1 0-dc000000
(XEN) SRAT: Node 1 PXM 1 100000000-1a4000000
(XEN) SRAT: Node 0 PXM 0 1a4000000-324000000
Which, after the needed calculations, gives exactly the same results
than memsize-s in `xl info -n'.
Now, if I add up the [*], [+], [=] and [|] regions above, and then
subtract that from node 1's PXMs, I see that node 1 has only ~6141MB of
usable RAM, instead of 6720MB.
And in fact, 6720-6141=579, just as much as 12864-12285=579.
So, if I haven't messed up with the calculations, it looks like that
Xen, when reporting to the upper layers the amount of memory it has
available, does filter out the non-RAM regions, if this happens via
XEN_SYSCTL_physinfo (i.e., by parsing E820), while it does not do that,
if this happens via XEN_SYSCTL_numainfo (i.e., by parsing ACPI SRAT).
What I'm not sure about is whether or not that was something
known/intended and whether or not it is something we should be concerned
about.
Thanks and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next parent reply other threads:[~2013-07-04 16:53 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130628142948.28579.8536.stgit@hit-nxdomain.opendns.com>
[not found] ` <20130628143244.28579.57535.stgit@hit-nxdomain.opendns.com>
[not found] ` <51D206FB.8000908@suse.com>
2013-07-04 16:53 ` Dario Faggioli [this message]
2013-07-04 17:21 ` [libvirt] [PATCH 1/4] libxl: implement NUMA capabilities reporting Konrad Rzeszutek Wilk
2013-07-04 17:29 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1372956792.10336.93.camel@Solace \
--to=dario.faggioli@citrix.com \
--cc=JBeulich@suse.com \
--cc=jfehlig@suse.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.