* KVM and the OOM-Killer
@ 2010-05-13 12:20 James Stevens
2010-05-13 12:39 ` Avi Kivity
2010-05-14 7:33 ` Athanasius
0 siblings, 2 replies; 13+ messages in thread
From: James Stevens @ 2010-05-13 12:20 UTC (permalink / raw)
To: kvm
This is *NOT* a KVM issue, but may be worth adding into the FAQ...
We have a KVM host with 48Gb of RAM and run about 20 KVM clients on it.
After some time - different time depending on the kernel version - the
VM host kernel will start OOM-Killing the VM clients, even when there is
lots of free RAM (>10Gb) and free SWAP (>34Gb).
This seems to be caused by the kernel running out of LOWMEM (memory
below 1Gb) - because of the large amount of RAM a lot of LOWMEM (~400Mb)
is used by the memory map (32 bytes per 4Kb page), add in the kernel
itself and that leaves "only" about 460Mb of LOWMEM for kernel alloc.
This may not have been a problem, except Linux may also put cache blocks
and user processes in LOWMEM - it seems this can then lead to a LOWMEM
exhaust situation which triggers OOM-Killing even when there is LOADS of
SWAP and HIGHMEM free.
Sadly, killing userland processes is not a good way to try and free
LOWMEM, so what happens is a killing spree where by every process on the
VM host gets killed (inc all the VMs, sysklogd, klogd, sshd, udevd etc).
This is very bad in 2.6.32.6, quite bad in 2.6.32.9, better (but still
bad in) 2.6.31.12 - currently testing 2.6.33.3
See https://bugzilla.kernel.org/show_bug.cgi?id=15058
General advice seems to be, if you have more than 16Gb RAM then you
should run the VM host 64bit.
We didn't see this issue on a server with 32Gb running the same set of VMs.
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 12:20 KVM and the OOM-Killer James Stevens
@ 2010-05-13 12:39 ` Avi Kivity
2010-05-13 13:39 ` James Stevens
2010-05-13 13:42 ` Johannes Stezenbach
2010-05-14 7:33 ` Athanasius
1 sibling, 2 replies; 13+ messages in thread
From: Avi Kivity @ 2010-05-13 12:39 UTC (permalink / raw)
To: James Stevens; +Cc: kvm
On 05/13/2010 03:20 PM, James Stevens wrote:
> This is *NOT* a KVM issue, but may be worth adding into the FAQ...
>
>
> We have a KVM host with 48Gb of RAM and run about 20 KVM clients on
> it. After some time - different time depending on the kernel version -
> the VM host kernel will start OOM-Killing the VM clients, even when
> there is lots of free RAM (>10Gb) and free SWAP (>34Gb).
>
> This seems to be caused by the kernel running out of LOWMEM (memory
> below 1Gb) - because of the large amount of RAM a lot of LOWMEM
> (~400Mb) is used by the memory map (32 bytes per 4Kb page), add in the
> kernel itself and that leaves "only" about 460Mb of LOWMEM for kernel
> alloc.
>
> This may not have been a problem, except Linux may also put cache
> blocks and user processes in LOWMEM - it seems this can then lead to a
> LOWMEM exhaust situation which triggers OOM-Killing even when there is
> LOADS of SWAP and HIGHMEM free.
>
> Sadly, killing userland processes is not a good way to try and free
> LOWMEM, so what happens is a killing spree where by every process on
> the VM host gets killed (inc all the VMs, sysklogd, klogd, sshd, udevd
> etc).
>
> This is very bad in 2.6.32.6, quite bad in 2.6.32.9, better (but
> still bad in) 2.6.31.12 - currently testing 2.6.33.3
>
> See https://bugzilla.kernel.org/show_bug.cgi?id=15058
>
>
> General advice seems to be, if you have more than 16Gb RAM then you
> should run the VM host 64bit.
>
> We didn't see this issue on a server with 32Gb running the same set of
> VMs.
>
I'd go with 64-bit at 2GB and above. It's both faster and safer.
The lowmem load is about 0.5% of guest memory, so 48GB means 240MB
lowmem allocated. Thin ice.
Since you can run a 64-bit kernel with your existing userspace, at least
you have a simple upgrade path.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 12:39 ` Avi Kivity
@ 2010-05-13 13:39 ` James Stevens
2010-05-13 13:53 ` Avi Kivity
2010-05-13 13:42 ` Johannes Stezenbach
1 sibling, 1 reply; 13+ messages in thread
From: James Stevens @ 2010-05-13 13:39 UTC (permalink / raw)
To: Avi Kivity, kvm
> I'd go with 64-bit at 2GB and above. It's both faster and safer.
safer, how? (apart from no lowmem exhaust).
On a different subject, the qemu documentation says a guest VM can only
have 2Gb of memory - does this still apply when using a 64bit host O/S ?
> The lowmem load is about 0.5% of guest memory, so 48GB means 240MB
> lowmem allocated. Thin ice.
We currently only have about 12Gb used by the VM guests, so not hitting
that issue - the rest of HIGHMEM is host disk cache.
Our experience is that the bottle-neck on number of VM guests is disk
i/o - with loads of memory we've pretty much eliminated reads, so that
means disk writes.
> Since you can run a 64-bit kernel with your existing userspace, at least
> you have a simple upgrade path.
Not sure I like the idea of running a 64bit user space kernel on top of
a 32bit host, prefer to re-install.
Can I just replace my kernel with a 64bit one, or do I have to
re-install the host O/S ?
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 12:39 ` Avi Kivity
2010-05-13 13:39 ` James Stevens
@ 2010-05-13 13:42 ` Johannes Stezenbach
1 sibling, 0 replies; 13+ messages in thread
From: Johannes Stezenbach @ 2010-05-13 13:42 UTC (permalink / raw)
To: Avi Kivity; +Cc: James Stevens, kvm
On Thu, May 13, 2010 at 03:39:20PM +0300, Avi Kivity wrote:
> On 05/13/2010 03:20 PM, James Stevens wrote:
> >
> >General advice seems to be, if you have more than 16Gb RAM then
> >you should run the VM host 64bit.
> >
> >We didn't see this issue on a server with 32Gb running the same
> >set of VMs.
> >
>
> I'd go with 64-bit at 2GB and above. It's both faster and safer.
There's an interesting posting from Linus which explains details:
http://www.realworldtech.com/forums/index.cfm?action=detail&id=78966&threadid=78766&roomid=2
HTH,
Johannes
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 13:39 ` James Stevens
@ 2010-05-13 13:53 ` Avi Kivity
2010-05-13 18:55 ` David S. Ahern
0 siblings, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2010-05-13 13:53 UTC (permalink / raw)
To: James Stevens; +Cc: kvm
On 05/13/2010 04:39 PM, James Stevens wrote:
>> I'd go with 64-bit at 2GB and above. It's both faster and safer.
>
> safer, how? (apart from no lowmem exhaust).
Nothing apart from that (well, if you run nonpae you lose NX protection).
>
> On a different subject, the qemu documentation says a guest VM can
> only have 2Gb of memory - does this still apply when using a 64bit
> host O/S ?
The documentation is outdated.
>
>> Since you can run a 64-bit kernel with your existing userspace, at least
>> you have a simple upgrade path.
>
> Not sure I like the idea of running a 64bit user space kernel on top
> of a 32bit host, prefer to re-install.
>
> Can I just replace my kernel with a 64bit one, or do I have to
> re-install the host O/S ?
You can run 32-bit userspace with a 64-bit kernel, or reinstall,
whichever you prefer.
I once upgraded a 32-bit Fedora install to 64-bit, but that takes some
tinkering.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 13:53 ` Avi Kivity
@ 2010-05-13 18:55 ` David S. Ahern
0 siblings, 0 replies; 13+ messages in thread
From: David S. Ahern @ 2010-05-13 18:55 UTC (permalink / raw)
To: James Stevens; +Cc: Avi Kivity, kvm
>> Not sure I like the idea of running a 64bit user space kernel on top
>> of a 32bit host, prefer to re-install.
>>
>> Can I just replace my kernel with a 64bit one, or do I have to
>> re-install the host O/S ?
>
> You can run 32-bit userspace with a 64-bit kernel, or reinstall,
> whichever you prefer.
>
> I once upgraded a 32-bit Fedora install to 64-bit, but that takes some
> tinkering.
>
You can just install a 64-bit kernel. For rpm based systems you have to
"unpack" the rpm using rpm2cpio. The modules.dep file cannot be updated
-- need to generate that elsewhere -- and mkinitrd needs to be modified
to not try to strip modules (s,strip,true,).
That's all I had to do to plop a 64-bit kernel onto a 32-bit install.
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-13 12:20 KVM and the OOM-Killer James Stevens
2010-05-13 12:39 ` Avi Kivity
@ 2010-05-14 7:33 ` Athanasius
2010-05-14 8:10 ` James Stevens
2010-05-14 8:19 ` Balbir Singh
1 sibling, 2 replies; 13+ messages in thread
From: Athanasius @ 2010-05-14 7:33 UTC (permalink / raw)
To: James Stevens; +Cc: kvm
[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]
On Thu, May 13, 2010 at 01:20:31PM +0100, James Stevens wrote:
> We have a KVM host with 48Gb of RAM and run about 20 KVM clients on it.
> After some time - different time depending on the kernel version - the
> VM host kernel will start OOM-Killing the VM clients, even when there is
> lots of free RAM (>10Gb) and free SWAP (>34Gb).
It seems going to a 64 bit kernel is what you want, but I thought it
worth mentioning the available method to say "try not to OOM-kill *this*
process":
echo "-16" > /proc/<pid>/oom_adj
I go through some convolutions in starting KVM VMs so that this gets
done automatically for them. It basically scores the process *way* down
for consideration to be OOM-killed. Handy if you might accidentally run
something on the KVM host that chews up all the available memory. It
should then get killed instead of the VMs. You can also use '-17' to
exclude the process from consideration entirely.
See <kernel source>/Documentation/filesystems/proc.txt for details.
--
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
Finger athan(at)fysh.org for PGP key
"And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 7:33 ` Athanasius
@ 2010-05-14 8:10 ` James Stevens
2010-05-14 8:21 ` Balbir Singh
2010-05-14 8:19 ` Balbir Singh
1 sibling, 1 reply; 13+ messages in thread
From: James Stevens @ 2010-05-14 8:10 UTC (permalink / raw)
To: kvm
> echo "-16"> /proc/<pid>/oom_adj
Thanks for that - yes, I know about "oom_adj", but it doesn't (totally)
work. "udevd" has a default of "-17" and it got killed anyway.
Also, the only thing this server runs is VMs so if they can't be killed
oom-killer will just run through the everything else (syslogd, sshd,
klogd, udevd, hald, agetty etc) - so on balance its a case of which is
worse? Without those daemons the system can become inaccessible and
could become unstable, so on balance it may be better to let it kill the
VMs.
My current work-around is :-
sync; echo 3 > /proc/sys/vm/drop_caches
I run it once a week midday Sunday in a cron job. This drops all blocks
from the disk cache, and free up pretty all of LOWMEM - which sounds
terrible, but its not that bad as each VM has its own cache anyway.
After running it I get ~2000 blocks/sec read for 1 minute, then ~20 to
50 blocks / sec read for the next few hours - which in terms of our disk
RAID is nearly nothing. My "normal" level of disk reads is zero.
BTW: I've had very good results with "-cpu host". Its reduced my context
switching considerably - up to 50% on one server - and reduced disk i/o
(not quite sure why!).
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 7:33 ` Athanasius
2010-05-14 8:10 ` James Stevens
@ 2010-05-14 8:19 ` Balbir Singh
1 sibling, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2010-05-14 8:19 UTC (permalink / raw)
To: James Stevens, kvm
* Athanasius <kvm@miggy.org> [2010-05-14 08:33:34]:
> On Thu, May 13, 2010 at 01:20:31PM +0100, James Stevens wrote:
> > We have a KVM host with 48Gb of RAM and run about 20 KVM clients on it.
> > After some time - different time depending on the kernel version - the
> > VM host kernel will start OOM-Killing the VM clients, even when there is
> > lots of free RAM (>10Gb) and free SWAP (>34Gb).
>
> It seems going to a 64 bit kernel is what you want, but I thought it
> worth mentioning the available method to say "try not to OOM-kill *this*
> process":
>
> echo "-16" > /proc/<pid>/oom_adj
>
A lot of this is being changed, but not yet committed. There are
patches out there to deal with the lowmem issue. Meanwhile, do follow
the suggestions on oom_adj and moving to 64 bit.
--
Three Cheers,
Balbir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 8:10 ` James Stevens
@ 2010-05-14 8:21 ` Balbir Singh
2010-05-14 8:43 ` James Stevens
0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2010-05-14 8:21 UTC (permalink / raw)
To: James Stevens; +Cc: kvm
* James Stevens <James.Stevens@jrcs.co.uk> [2010-05-14 09:10:19]:
> > echo "-16"> /proc/<pid>/oom_adj
>
> Thanks for that - yes, I know about "oom_adj", but it doesn't
> (totally) work. "udevd" has a default of "-17" and it got killed
> anyway.
>
> Also, the only thing this server runs is VMs so if they can't be
> killed oom-killer will just run through the everything else
> (syslogd, sshd, klogd, udevd, hald, agetty etc) - so on balance its
> a case of which is worse? Without those daemons the system can
> become inaccessible and could become unstable, so on balance it may
> be better to let it kill the VMs.
>
> My current work-around is :-
>
> sync; echo 3 > /proc/sys/vm/drop_caches
>
Have you looked at memory cgroups and using that with limits with VMs?
--
Three Cheers,
Balbir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 8:21 ` Balbir Singh
@ 2010-05-14 8:43 ` James Stevens
2010-05-14 12:28 ` Balbir Singh
0 siblings, 1 reply; 13+ messages in thread
From: James Stevens @ 2010-05-14 8:43 UTC (permalink / raw)
To: balbir, kvm
> Have you looked at memory cgroups and using that with limits with VMs?
The problem was *NOT* that my VMs exhausted all memory. I know that is
what "normally" triggers oom-killer, but you have to understand this
mine was a very different scenario, hence I wanted to bring it to
people's attention. I had about 10Gb of *FREE* HIGH and 34GB of *FREE*
SWAP when oom-killer was activated - yep, didn't make sense to me
either. If you want to study the logs :-
https://bugzilla.kernel.org/show_bug.cgi?id=15058
Looks like the problem was LOWMEM exhaust that triggered oom-killer.
Which is dumb, because it was cache that was exhausting LOWMEM, and
killing userland processes isn't a great way to deal with that issue.
[My] VMs generally alloate all resource at start-up and that's it.
Committed_AS: 14345016 kB
I tried "vm.overcommit_memory=2" and that didn't help. On a 48Gb system
oom-killer should NEVER be invoked with that kind of memory profile -
Its a quirk of running a 32bit system with *so* much memory, and the way
pre-2.6.33 handled LOWMEM.
We've now moved all VM guests onto one server in preparation for a
re-install of the other with 64bit host O/S.
Tests with 2.6.33.3 (+ latest qemu) appear to show this issue is fixed
in the latest kernel (I can see it has much improved LOWMEM management),
but we've only been running it days, and it can take 3 to 4 weeks to
trigger.
FYI: We run about 100 VM guests on 7 VM hosts in five data centres -
mostly production, some development. We've been using KVM in a
production environment for a while now - starting [in production] at
about KVM-82 on 2.6.28 - our oldest live systems now are two on KVM-84
on 2.6.28.4 and they are rock solid (one gets more punishment than it
deserves) - but they only have 16Gb, so aren't seeing LOWMEM exhaust cos
their memory map is *so* much smaller.
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 8:43 ` James Stevens
@ 2010-05-14 12:28 ` Balbir Singh
2010-05-14 13:01 ` James Stevens
0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2010-05-14 12:28 UTC (permalink / raw)
To: James Stevens; +Cc: kvm
* James Stevens <James.Stevens@jrcs.co.uk> [2010-05-14 09:43:04]:
> >Have you looked at memory cgroups and using that with limits with VMs?
>
> The problem was *NOT* that my VMs exhausted all memory. I know that
> is what "normally" triggers oom-killer, but you have to understand
> this mine was a very different scenario, hence I wanted to bring it
> to people's attention. I had about 10Gb of *FREE* HIGH and 34GB of
> *FREE* SWAP when oom-killer was activated - yep, didn't make sense
> to me either. If you want to study the logs :-
>
I understand, You could potentially encapsulate all else - except your
VM's in a small cgroup and frequently reclaim from there using the
memory cgroup. If drop caches works for you, that is good too. I am
surprised that cache allocations are causing lowmem exhaustion.
--
Three Cheers,
Balbir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: KVM and the OOM-Killer
2010-05-14 12:28 ` Balbir Singh
@ 2010-05-14 13:01 ` James Stevens
0 siblings, 0 replies; 13+ messages in thread
From: James Stevens @ 2010-05-14 13:01 UTC (permalink / raw)
To: balbir, kvm
> I understand, You could potentially encapsulate all else - except your
> VM's in a small cgroup and frequently reclaim from there using the
> memory cgroup. If drop caches works for you, that is good too. I am
> surprised that cache allocations are causing lowmem exhaustion.
I'm suirprised that LOWMEM exhaustion causes oom-killer (when HIGH &
SWAP are half empty) but there you go, life is full of surprises ;)
Of course, it could be something else, but its definitely disk i/o that
triggers the oom-killer and clearing the cache stops it from happening
(2.6.31.12) - any alternative explain really really welcome.
James
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-05-14 13:01 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-13 12:20 KVM and the OOM-Killer James Stevens
2010-05-13 12:39 ` Avi Kivity
2010-05-13 13:39 ` James Stevens
2010-05-13 13:53 ` Avi Kivity
2010-05-13 18:55 ` David S. Ahern
2010-05-13 13:42 ` Johannes Stezenbach
2010-05-14 7:33 ` Athanasius
2010-05-14 8:10 ` James Stevens
2010-05-14 8:21 ` Balbir Singh
2010-05-14 8:43 ` James Stevens
2010-05-14 12:28 ` Balbir Singh
2010-05-14 13:01 ` James Stevens
2010-05-14 8:19 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).