From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: KVM Guest Lock up (100%) again! Date: Fri, 12 Apr 2013 18:13:16 +0300 Message-ID: <20130412151316.GA25786@redhat.com> References: <764654559.1031795.1365086171933.JavaMail.root@innovot.com> <717425441.1193987.1365451324088.JavaMail.root@innovot.com> <20130410141027.GI17919@redhat.com> <221698623.1341142.1365775843844.JavaMail.root@innovot.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org To: Phil Daws Return-path: Received: from mx1.redhat.com ([209.132.183.28]:12783 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753831Ab3DLPNX (ORCPT ); Fri, 12 Apr 2013 11:13:23 -0400 Content-Disposition: inline In-Reply-To: <221698623.1341142.1365775843844.JavaMail.root@innovot.com> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Apr 12, 2013 at 03:10:43PM +0100, Phil Daws wrote: > Well this is still happening ... I have tried to isolate what could be causing but not much luck yet. Thought the VMs may have been IO bound but that not the case and even tried upping the vCPU allocation from one to two as plenty of head room. When it locks up I see this on a strace: > > [pid 1343] read(14, 0x7fff82aeb360, 4096) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] read(7, "\0", 512) = 1 > [pid 1343] read(7, 0x7fff82aec160, 512) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] select(26, [7 10 13 14 16 17 22 25], [], [], {1, 0}) = 1 (in [16], left {0, 999981}) > [pid 1343] read(16, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0"..., 128) = 128 > [pid 1343] rt_sigaction(SIGALRM, NULL, {0x7f210b2c0510, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f210ac22500}, 8) = 0 > [pid 1343] write(8, "\0", 1) = 1 > [pid 1343] write(15, "\1\0\0\0\0\0\0\0", 8) = 8 > [pid 1343] read(16, 0x7fff82aec2d0, 128) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] timer_gettime(0x1, {it_interval={0, 0}, it_value={0, 0}}) = 0 > [pid 1343] timer_settime(0x1, 0, {it_interval={0, 0}, it_value={0, 656000000}}, NULL) = 0 > [pid 1343] select(26, [7 10 13 14 16 17 22 25], [], [], {1, 0}) = 2 (in [7 14], left {0, 999998}) > [pid 1343] read(14, "\1\0\0\0\0\0\0\0", 4096) = 8 > [pid 1343] read(14, 0x7fff82aeb360, 4096) = -1 EAGAIN (Resource temporarily unavailable) > [pid 1343] read(7, "\0", 512) = 1 > [pid 1343] read(7, 0x7fff82aec160, 512) = -1 EAGAIN (Resource temporarily unavailable) > > Does that shed any light ? Trying to find a how to for upgrading to the latest KVM/QEMU. > Is the lockup with upstream now? strace is not very helpful to diagnose kvm problems. Try to run ftrace: http://www.linux-kvm.org/page/Tracing -- Gleb.