From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Daws Subject: Re: KVM Guest Lock up (100%) again! Date: Fri, 12 Apr 2013 17:31:56 +0100 (BST) Message-ID: <531816088.1344948.1365784316602.JavaMail.root@innovot.com> References: <764654559.1031795.1365086171933.JavaMail.root@innovot.com> <717425441.1193987.1365451324088.JavaMail.root@innovot.com> <20130410141027.GI17919@redhat.com> <221698623.1341142.1365775843844.JavaMail.root@innovot.com> <20130412151316.GA25786@redhat.com> Reply-To: Phil Daws Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Gleb Natapov Return-path: Received: from mx1.dc1.innovot.com ([77.73.4.109]:39754 "EHLO mx1.dc1.innovot.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752441Ab3DLQdT (ORCPT ); Fri, 12 Apr 2013 12:33:19 -0400 In-Reply-To: <20130412151316.GA25786@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Was running two guests on k3.7.10 but have now switched one to stock 2.6.32; and neither have crashed yet. Will leave running as and if it stays stable will switch out the other kernel to stock as-well. Am wondering if have hit a kernel buglet. Thank you for the ftrace info. Have a great weekend. ----- Original Message ----- From: "Gleb Natapov" To: "Phil Daws" Cc: kvm@vger.kernel.org Sent: Friday, 12 April, 2013 4:13:16 PM Subject: Re: KVM Guest Lock up (100%) again! On Fri, Apr 12, 2013 at 03:10:43PM +0100, Phil Daws wrote: > Well this is still happening ... I have tried to isolate what could be causing but not much luck yet. Thought the VMs may have been IO bound but that not the case and even tried upping the vCPU allocation from one to two as plenty of head room. When it locks up I see this on a strace: > > [pid 1343] read(14, 0x7fff82aeb360, 4096) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] read(7, "\0", 512) = 1 > [pid 1343] read(7, 0x7fff82aec160, 512) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] select(26, [7 10 13 14 16 17 22 25], [], [], {1, 0}) = 1 (in [16], left {0, 999981}) > [pid 1343] read(16, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0"..., 128) = 128 > [pid 1343] rt_sigaction(SIGALRM, NULL, {0x7f210b2c0510, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f210ac22500}, 8) = 0 > [pid 1343] write(8, "\0", 1) = 1 > [pid 1343] write(15, "\1\0\0\0\0\0\0\0", 8) = 8 > [pid 1343] read(16, 0x7fff82aec2d0, 128) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] timer_gettime(0x1, {it_interval={0, 0}, it_value={0, 0}}) = 0 > [pid 1343] timer_settime(0x1, 0, {it_interval={0, 0}, it_value={0, 656000000}}, NULL) = 0 > [pid 1343] select(26, [7 10 13 14 16 17 22 25], [], [], {1, 0}) = 2 (in [7 14], left {0, 999998}) > [pid 1343] read(14, "\1\0\0\0\0\0\0\0", 4096) = 8 > [pid 1343] read(14, 0x7fff82aeb360, 4096) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] read(7, "\0", 512) = 1 > [pid 1343] read(7, 0x7fff82aec160, 512) = -1 EAGAIN (Resource temporarily unavailable) > > Does that shed any light ? Trying to find a how to for upgrading to the latest KVM/QEMU. > Is the lockup with upstream now? strace is not very helpful to diagnose kvm problems. Try to run ftrace: http://www.linux-kvm.org/page/Tracing -- Gleb.