From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57732)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1YSSf7-0003GY-Fp
	for qemu-devel@nongnu.org; Mon, 02 Mar 2015 10:53:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1YSSf1-0007Po-9y
	for qemu-devel@nongnu.org; Mon, 02 Mar 2015 10:53:41 -0500
Received: from vps01.wiesinger.com ([46.36.37.179]:60745)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1YSSf1-000724-4S
	for qemu-devel@nongnu.org; Mon, 02 Mar 2015 10:53:35 -0500
Message-ID: <54F48734.7020800@wiesinger.com>
Date: Mon, 02 Mar 2015 16:52:20 +0100
From: Gerhard Wiesinger <lists@wiesinger.com>
MIME-Version: 1.0
References: <54AE87C1.2060907@wiesinger.com>	<54AEBD43.2060705@redhat.com>	<54AEC877.9080600@wiesinger.com>	<54AECAF3.3060909@redhat.com>	<54AF047D.8010009@wiesinger.com>	<54B3B2F5.1090405@wiesinger.com>	<54B57C51.7090002@wiesinger.com>	<54B584AB.4090303@redhat.com>	<54B58AC0.5080805@wiesinger.com>	<54B58B18.9060205@redhat.com>	<54B595C7.3080101@wiesinger.com>	<54B5BF5F.9000805@redhat.com>	<54B633CE.3040901@wiesinger.com>	<54E05659.9050701@wiesinger.com>	<54E1FC2B.3030805@redhat.com>	<54E20812.4090006@wiesinger.com>	<54E20CD5.3050909@redhat.com>
	<54F2EBA5.4050907@wiesinger.com> <54F42CC7.20504@redhat.com>
In-Reply-To: <54F42CC7.20504@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in
 gettimeofday(&tp, NULL); forever
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, Laine Stump <laine@redhat.com>, qemu-devel@nongnu.org, Cole Robinson <crobinso@redhat.com>, virt@lists.fedoraproject.org

On 02.03.2015 10:26, Paolo Bonzini wrote:
>
> On 01/03/2015 11:36, Gerhard Wiesinger wrote:
>> So far it happened only the PostgreSQL database VM. Kernel is alive
>> (ping works well). ssh is not working.
>> console window: after entering one character at login prompt, then crashed:
>> [1438.384864] Out of memory: Kill process 10115 (pg_dump) score 112 or
>> sacrifice child
>> [1438.384990] Killed process 10115 (pg_dump) total-vm: 340548kB,
>> anon-rss: 162712kB, file-rss: 220kB
> Can you get a vmcore or at least sysrq-t output?

Yes, next time it happens I can analyze it.

I think there are 2 problems:
1.) OOM (Out of Memory) problem with the low memory settings and kernel 
settings (see below)
2.) Instability problem which might have a dependency to 1.)

What I've done so far (thanks to Andrey Korolyov for ideas and help):
a.) Updated maschine type from pc-0.15 to pc-i440fx-2.2
virsh dumpxml database | grep "<type"
     <type arch='x86_64' machine='pc-0.15'>hvm</type>

virsh edit database
virsh dumpxml database | grep "<type"
     <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>

SMBIOS is updated therefore from 2.4 to 2.8:
dmesg|grep -i SMBIOS
[    0.000000] SMBIOS 2.8 present.
b.) Switched to tsc clock, kernel parameters: clocksource=tsc nohz=off 
highres=off
c.) Changed overcommit to 1
echo "vm.overcommit_memory = 1" > /etc/sysctl.d/overcommit.conf
d.) Tried 1 VCPU instead of 2
e.) Installed 512MB vRAM instead of 384MB
f.) Prepared for sysrq and vmcore
echo "kernel.sysrq = 1" > /etc/sysctl.d/sysrq.conf
sysctl -w kernel.sysrq=1
virsh send-key database KEY_LEFTALT KEY_SYSRQ KEY_T
virsh dump domain-name /tmp/dumpfile
g.) Further ideas, not yet done: disable memory balooning by 
blacklisting baloon driver or remove from virsh xml config

Summary:
1.) 512MB, tsc timer, 1VCPU, vm.overcommit_memory = 1: no OOM problem, 
no crash
2.) 512MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, 
no crash

So the OOM problem seems to be solved (at least it didn't happen so far) 
by installing 512MB RAM and setting vm.overcommit_memory = 1 (I guess 
just setting overcommit would be fine, too).

Instability didn't occour so far. If I can't reproduce it, I'll revert 
the settings.

Ciao,
Gerhard