* "right" way to gather domU stats in xen 3 & 4?
@ 2011-02-26 14:34 Florian Heigl
2011-02-28 11:47 ` Stefano Stabellini
0 siblings, 1 reply; 6+ messages in thread
From: Florian Heigl @ 2011-02-26 14:34 UTC (permalink / raw)
To: xen-devel
Hi all,
I'm building a xen agent for nagios / check_mk.
Automatic inventory of VMs and the basic up / down reporting are
reliable now, and I'm looking at the next items on my list.
* Free memory. This seems easy at first, look at xm info and that's
mostly it. I can have a different color for memory allocated to dom0
minus the dom0 lower balloon limit, but I'll also have a check that
will go to full alarm if anyone is crazy enough to use dom0 balloning.
;)
What I don't know is if I also need to substract something for the
Xen heap? Long ago it used to default to 32MB i think. Can someone
clue me in about that - is it relevant to xm info free / total mem?
* per domU I also wanna look at memory statistics.
- one thing is: mem vs. mem-max to show balloning.
- the other thing is tmem: i don't know if i should spend the time
getting it right as I start getting the impression that since it was
added by Dan and now tmem2 was added, two-and-a-half years went down
where it's considered working implemented none bothers to make it work
for everyone. i.e. the recent directed that the direct ballooning
daemon was just a lab exercise ;) If you know of any people that
successfully run xen with tmem2 and such, I'd love to work with them
to build the nagios-sy statistics .Otherwise I'll save myself the
headaches.
* per domU cpu percent (to show how much of the dom0 power the vm is
consuming)...
Speed issues:
Usually checks in check_mk are fired off every minute, so it would be
good if I can directly via xenstore to collect and report my data
within 1-2 seconds or less. Speed seems to be an issue I have to worry
about - on my "top of the shelf" xen host it will take around
0.6seconds to query a meager 5 VMs.
That's just a 1.5GHz VIA box, but I'll have to see how long it takes
for 100 VMs or more.
Documentation??
What I'm missing is some document that'd show all nodes in the
xenstore that are readable. I've poked around a lot already but the
statistics are hiding from me.
Also I would try to use something that can work in xen4 and xen3. But
that's not mandatory, I can fallback from xl to xenstore-read to xm to
libvirt.
Why you might want to help:
Using check_mk you can pull off all kinds of crazy stuff with the data
it collects:
trend analyzing on disk usage ("simple" example: get an alert if
your vm store is growing at a rate that will let it run out of space
in 3 days)
if somebody feels they need it, use the block IO rates to trigger
an eventhandler that will put io & cpu caps on a VM. (hosters might
love that :)
I think most of these features are not implemented in any nagios
checks so far
If I just hack it in ksh, it *will work*, but be ugly and slow :)
and of course you won't have to bother with any config files to add a VM!
Maybe someone likes xenstore *a lot* and can point me at the right spots.
Florian
p.s.:
could interested parties consider spending a day to improve the xm list output?
it may technically make sense that a vm created using xm new has no ID
and no status instead of "-------" and a VM that is running but didn't
use CPU during the microsecond we queried it is shown as blocking.
But it makes life harder for each and every xen user for 5 or 6 years
now, and technical reasons really don't cut it if they turn
information into worthless bytes. (I still feel you would get an
"-r-----" state most of the time back in Xen2...)
--
the purpose of libvirt is to provide an abstraction layer hiding all
xen features added since 2006 until they were finally understood and
copied by the kvm devs.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "right" way to gather domU stats in xen 3 & 4?
2011-02-26 14:34 "right" way to gather domU stats in xen 3 & 4? Florian Heigl
@ 2011-02-28 11:47 ` Stefano Stabellini
2011-02-28 23:16 ` Florian Heigl
0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2011-02-28 11:47 UTC (permalink / raw)
To: Florian Heigl; +Cc: xen-devel
On Sat, 26 Feb 2011, Florian Heigl wrote:
> Hi all,
>
> I'm building a xen agent for nagios / check_mk.
> Automatic inventory of VMs and the basic up / down reporting are
> reliable now, and I'm looking at the next items on my list.
it looks like a interesting and useful project
> * Free memory. This seems easy at first, look at xm info and that's
> mostly it. I can have a different color for memory allocated to dom0
> minus the dom0 lower balloon limit, but I'll also have a check that
> will go to full alarm if anyone is crazy enough to use dom0 balloning.
> ;)
> What I don't know is if I also need to substract something for the
> Xen heap? Long ago it used to default to 32MB i think. Can someone
> clue me in about that - is it relevant to xm info free / total mem?
libxenlight provides a function that is called libxl_get_free_memory
that returns the amount of free memory in the system.
You can call it directly (adding a libxenlight dependency to your code)
or you could simply give a look at the implementation
(tools/libxl/libxl.c:libxl_get_free_memory).
Also on hosts managed by libxenlight there is an additional xenstore
node called /local/domain/0/memory/freemem-slack that contains the
amount of memory that is going to be left free for Xen.
In case you are wondering xen 4.1 is going to ship with two toolstacks:
the old xend and a new one that is a library called libxenlight plus a
minimal C utility called xl to invoke the library functions.
xl/libxenlight are recommended over xend.
> * per domU I also wanna look at memory statistics.
> - one thing is: mem vs. mem-max to show balloning.
> - the other thing is tmem: i don't know if i should spend the time
> getting it right as I start getting the impression that since it was
> added by Dan and now tmem2 was added, two-and-a-half years went down
> where it's considered working implemented none bothers to make it work
> for everyone. i.e. the recent directed that the direct ballooning
> daemon was just a lab exercise ;) If you know of any people that
> successfully run xen with tmem2 and such, I'd love to work with them
> to build the nagios-sy statistics .Otherwise I'll save myself the
> headaches.
>
> * per domU cpu percent (to show how much of the dom0 power the vm is
> consuming)...
>
>
> Speed issues:
> Usually checks in check_mk are fired off every minute, so it would be
> good if I can directly via xenstore to collect and report my data
> within 1-2 seconds or less. Speed seems to be an issue I have to worry
> about - on my "top of the shelf" xen host it will take around
> 0.6seconds to query a meager 5 VMs.
> That's just a 1.5GHz VIA box, but I'll have to see how long it takes
> for 100 VMs or more.
Xenstore can become very busy on systems with many VMs running.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "right" way to gather domU stats in xen 3 & 4?
2011-02-28 11:47 ` Stefano Stabellini
@ 2011-02-28 23:16 ` Florian Heigl
2011-03-01 11:13 ` Stefano Stabellini
2011-03-01 16:32 ` Dan Magenheimer
0 siblings, 2 replies; 6+ messages in thread
From: Florian Heigl @ 2011-02-28 23:16 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
Hi Stefano,
firstofall thanks for your reply!
2011/2/28 Stefano Stabellini <stefano.stabellini@eu.citrix.com>:
> On Sat, 26 Feb 2011, Florian Heigl wrote:
>> Hi all,
>>
>> I'm building a xen agent for nagios / check_mk.
>> Automatic inventory of VMs and the basic up / down reporting are
>> reliable now, and I'm looking at the next items on my list.
>
> it looks like a interesting and useful project
I hope it'll be helpful, definitely works good for me. Things are a
lot easier if you can just say "scan for any VMs on that host" and
then they're monitored / assign them to clusters. ( you can read here
if you wanna: http://deranfangvomende.wordpress.com/2011/02/09/check_mk-xen-plugin-online/
).
I'm a *very* great fan of libxenlight. Many years ago there was
"libxen" which wasn't brought over to Xen3 and it was really time
there's a new fast tool "to rule them all". (i just had to).
The host-side agent is very small and thus i'll be just in /bin/sh and
use xm/xl as available. I could use python, too, but if libxenlight is
around the corner i don't wanna re-introduce a python dependency :)
I'm gonna trash the local agent code a few more times since it's
neither elegant nor fast yet. Both shell and python should work on
Linux/NetBSD/Solaris. On the other hand the python bindings as shown
at http://wiki.xensource.com/xenwiki/XenApi are probably completely
outdated, and libxenlight is only available on Xen4.1 which severly
limits it's usability right now.
Not sure how to go about this, but I think it will pay out to start
simple with "xm", not thinking about performance impact and then
rewrite the host agent later on to mostly use xl via i.e. python.
I understand I gave too much thought about free memory and how much of
is used by dom0/hypervisor/free. Besides the free memory nobody ever
cares, me included. On most of my hosts I couldn't say how much
"total_mb" they display, because I just look at the "free_mb". So that
point is sorted.
I will try digging into xentop over the next days, as I the main
magick of breaking down stats per domU is still open.
I hope I will find other data than cpu seconds used, because that
would mean UGLY calculations
(in theory: multiply uptime by number of cores, and divide that by the
seconds used by the domain?)
Any comment about tmem / baloon would still be great... why doesn't
anyone jump when our coolest features are mentioned? :)
I think it's important to make them visible to the general users...
>> That's just a 1.5GHz VIA box, but I'll have to see how long it takes
>> for 100 VMs or more.
>
> Xenstore can become very busy on systems with many VMs running.
So, any advice? Obviously, limiting my queries is the main trick, but
seems the tools do a lot of calls internally.
I wonder if that post about xenstore IO performance
http://xen.1045712.n5.nabble.com/Revisiting-XenD-XenStored-performance-scalability-issues-td2504870.html
still applies. I'll try the ramdisk hack he described out of
curiosity.
Florian
--
the purpose of libvirt is to provide an abstraction layer hiding all
xen features added since 2006 until they were finally understood and
copied by the kvm devs.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "right" way to gather domU stats in xen 3 & 4?
2011-02-28 23:16 ` Florian Heigl
@ 2011-03-01 11:13 ` Stefano Stabellini
2011-03-01 16:32 ` Dan Magenheimer
1 sibling, 0 replies; 6+ messages in thread
From: Stefano Stabellini @ 2011-03-01 11:13 UTC (permalink / raw)
To: Florian Heigl; +Cc: Dan Magenheimer, xen-devel, Stefano Stabellini
On Mon, 28 Feb 2011, Florian Heigl wrote:
> Any comment about tmem / baloon would still be great... why doesn't
> anyone jump when our coolest features are mentioned? :)
> I think it's important to make them visible to the general users...
Ballooning shouldn't be difficult, it is just a matter of reading
memory/target and memory/static-max from xenstore.
You could also read the actual memory used by the domain and compare it
with target.
Regarding tmem I'll let Dan comment on it.
> >> That's just a 1.5GHz VIA box, but I'll have to see how long it takes
> >> for 100 VMs or more.
> >
> > Xenstore can become very busy on systems with many VMs running.
>
> So, any advice? Obviously, limiting my queries is the main trick, but
> seems the tools do a lot of calls internally.
>
> I wonder if that post about xenstore IO performance
> http://xen.1045712.n5.nabble.com/Revisiting-XenD-XenStored-performance-scalability-issues-td2504870.html
> still applies. I'll try the ramdisk hack he described out of
> curiosity.
It still applies to XenD but nowadays the development is mostly on
xl/libxenlight that in response to "xl list" does a xenstore read per
domain to resolve the domain name.
If you are not interested in the domain name you could just call
libxl_list_domain to have the list of domains running with a basic set
of information (see libxl_dominfo, contains memory usage, cpu usage and
number of online vcpus), no xenstore transactions at all!
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: "right" way to gather domU stats in xen 3 & 4?
2011-02-28 23:16 ` Florian Heigl
2011-03-01 11:13 ` Stefano Stabellini
@ 2011-03-01 16:32 ` Dan Magenheimer
2011-03-01 16:59 ` Florian Heigl
1 sibling, 1 reply; 6+ messages in thread
From: Dan Magenheimer @ 2011-03-01 16:32 UTC (permalink / raw)
To: Florian Heigl, Stefano Stabellini; +Cc: xen-devel
> Any comment about tmem / baloon would still be great... why doesn't
> anyone jump when our coolest features are mentioned? :)
> I think it's important to make them visible to the general users...
Hi Florian --
Tmem has no value without guest kernel changes and getting those
changes (even though very small) into the Linux kernel has proven
to be a very long frustrating experience, which I hope will
finally come to fruition soon. Once in the upstream kernel,
distro domUs will still need to merge and enable these changes.
A couple of key things to plan for in your management tools:
1) Don't assume that the amount of memory used by a guest is
fixed and/or only under the control of your tools.
2) When tmem is in use, make sure you understand the difference
between "free memory" and "freeable memory".
Dan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "right" way to gather domU stats in xen 3 & 4?
2011-03-01 16:32 ` Dan Magenheimer
@ 2011-03-01 16:59 ` Florian Heigl
0 siblings, 0 replies; 6+ messages in thread
From: Florian Heigl @ 2011-03-01 16:59 UTC (permalink / raw)
To: Dan Magenheimer; +Cc: xen-devel, Stefano Stabellini
Hi Dan,
2011/3/1 Dan Magenheimer <dan.magenheimer@oracle.com>:
> Tmem has no value without guest kernel changes and getting those
> changes (even though very small) into the Linux kernel has proven
> to be a very long frustrating experience, which I hope will
I wondered for some time now... can't you just push it into Oracle VM
& OEL in the meantime?
Even as an unsupported kernel, It would "work for me" and my customers.
About the long frustrating experience, see my sig :)
> A couple of key things to plan for in your management tools:
> 1) Don't assume that the amount of memory used by a guest is
> fixed and/or only under the control of your tools.
Thats why I've been asking so intently. Right now it will be great to
have a graph showing mem and maxmem, but when tmem is seeing more
adaption any baloon stats become less useful.
Also, as of today, half of the distros doesn't have working cpu
hotplug or balooning anyway.
Anyway, thanks for the update :)
Flo
--
the purpose of libvirt is to provide an abstraction layer hiding all
xen features added since 2006 until they were finally understood and
copied by the kvm devs.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-01 16:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-26 14:34 "right" way to gather domU stats in xen 3 & 4? Florian Heigl
2011-02-28 11:47 ` Stefano Stabellini
2011-02-28 23:16 ` Florian Heigl
2011-03-01 11:13 ` Stefano Stabellini
2011-03-01 16:32 ` Dan Magenheimer
2011-03-01 16:59 ` Florian Heigl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).