From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38255) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSit7-00009y-Jd for qemu-devel@nongnu.org; Tue, 03 Mar 2015 04:13:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSit3-0001SM-DK for qemu-devel@nongnu.org; Tue, 03 Mar 2015 04:13:13 -0500 Received: from vps01.wiesinger.com ([46.36.37.179]:33042) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSit3-0001OL-2r for qemu-devel@nongnu.org; Tue, 03 Mar 2015 04:13:09 -0500 Message-ID: <54F57B17.50100@wiesinger.com> Date: Tue, 03 Mar 2015 10:12:55 +0100 From: Gerhard Wiesinger MIME-Version: 1.0 References: <54AE87C1.2060907@wiesinger.com> <54AEBD43.2060705@redhat.com> <54AEC877.9080600@wiesinger.com> <54AECAF3.3060909@redhat.com> <54AF047D.8010009@wiesinger.com> <54B3B2F5.1090405@wiesinger.com> <54B57C51.7090002@wiesinger.com> <54B584AB.4090303@redhat.com> <54B58AC0.5080805@wiesinger.com> <54B58B18.9060205@redhat.com> <54B595C7.3080101@wiesinger.com> <54B5BF5F.9000805@redhat.com> <54B633CE.3040901@wiesinger.com> <54E05659.9050701@wiesinger.com> <54E1FC2B.3030805@redhat.com> <54E20812.4090006@wiesinger.com> <54E20CD5.3050909@redhat.com> <54F2EBA5.4050907@wiesinger.com> <54F42CC7.20504@redhat.com> <54F48734.7020800@wiesinger.com> <54F49A95.20300@wiesinger.com> In-Reply-To: <54F49A95.20300@wiesinger.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(&tp, NULL); forever List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Laine Stump , qemu-devel@nongnu.org, Cole Robinson , virt@lists.fedoraproject.org On 02.03.2015 18:15, Gerhard Wiesinger wrote: > On 02.03.2015 16:52, Gerhard Wiesinger wrote: >> On 02.03.2015 10:26, Paolo Bonzini wrote: >>> >>> On 01/03/2015 11:36, Gerhard Wiesinger wrote: >>>> So far it happened only the PostgreSQL database VM. Kernel is alive >>>> (ping works well). ssh is not working. >>>> console window: after entering one character at login prompt, then >>>> crashed: >>>> [1438.384864] Out of memory: Kill process 10115 (pg_dump) score 112 or >>>> sacrifice child >>>> [1438.384990] Killed process 10115 (pg_dump) total-vm: 340548kB, >>>> anon-rss: 162712kB, file-rss: 220kB >>> Can you get a vmcore or at least sysrq-t output? >> >> Yes, next time it happens I can analyze it. >> >> I think there are 2 problems: >> 1.) OOM (Out of Memory) problem with the low memory settings and >> kernel settings (see below) >> 2.) Instability problem which might have a dependency to 1.) >> >> What I've done so far (thanks to Andrey Korolyov for ideas and help): >> a.) Updated maschine type from pc-0.15 to pc-i440fx-2.2 >> virsh dumpxml database | grep "> hvm >> >> virsh edit database >> virsh dumpxml database | grep "> hvm >> >> SMBIOS is updated therefore from 2.4 to 2.8: >> dmesg|grep -i SMBIOS >> [ 0.000000] SMBIOS 2.8 present. >> b.) Switched to tsc clock, kernel parameters: clocksource=tsc >> nohz=off highres=off >> c.) Changed overcommit to 1 >> echo "vm.overcommit_memory = 1" > /etc/sysctl.d/overcommit.conf >> d.) Tried 1 VCPU instead of 2 >> e.) Installed 512MB vRAM instead of 384MB >> f.) Prepared for sysrq and vmcore >> echo "kernel.sysrq = 1" > /etc/sysctl.d/sysrq.conf >> sysctl -w kernel.sysrq=1 >> virsh send-key database KEY_LEFTALT KEY_SYSRQ KEY_T >> virsh dump domain-name /tmp/dumpfile >> g.) Further ideas, not yet done: disable memory balooning by >> blacklisting baloon driver or remove from virsh xml config >> >> Summary: >> 1.) 512MB, tsc timer, 1VCPU, vm.overcommit_memory = 1: no OOM >> problem, no crash >> 2.) 512MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM >> problem, no crash > > 3.) 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, > no crash 3b.) Still happened again at the nightly backup with same configuration as in 3.) configuration 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1, pc-i440fx-2.2: no OOM problem, ping ok, no reaction, BUT CRASHED again SYSRQ: no reaction of the VM virsh send-key vm KEY_LEFTALT KEY_SYSRQ KEY_T virsh dump vm file.core error: Failed to core dump domain vm to file.core error: internal error: unable to execute QEMU command 'migrate': State blocked by non-migratable device '0000:00:09.0/ich9_ahci' Removed the SATA controller, dump should work for the future. Any futher ideas? Ciao, Gerhard