From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40980) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWiLv-00075X-WA for qemu-devel@nongnu.org; Wed, 24 Sep 2014 04:55:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XWiLm-0003r0-Uy for qemu-devel@nongnu.org; Wed, 24 Sep 2014 04:55:11 -0400 Received: from mail-we0-x22a.google.com ([2a00:1450:400c:c03::22a]:51960) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWiLm-0003iJ-Nt for qemu-devel@nongnu.org; Wed, 24 Sep 2014 04:55:02 -0400 Received: by mail-we0-f170.google.com with SMTP id x48so4378374wes.29 for ; Wed, 24 Sep 2014 01:54:56 -0700 (PDT) Date: Wed, 24 Sep 2014 09:54:53 +0100 From: Stefan Hajnoczi Message-ID: <20140924085453.GC21137@stefanha-thinkpad.redhat.com> References: <20140918104922.GF8847@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="bAmEntskrkuBymla" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] qemu process stuck in Rl state List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrey Korolyov Cc: "qemu-devel@nongnu.org" --bAmEntskrkuBymla Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 18, 2014 at 03:21:08PM +0400, Andrey Korolyov wrote: > On Thu, Sep 18, 2014 at 2:49 PM, Stefan Hajnoczi wro= te: > > On Wed, Sep 17, 2014 at 11:56:57PM +0400, Andrey Korolyov wrote: > >> I`ve faced an issue with qemu VMs with very large uptime spans - half > >> of year or so. They are hanging in running state forever and are not > >> killable in any imaginable fashion. Tried to freeze it via freezer cg > >> without any luck. VM itself went unresponsive with zero cpu > >> consumption after reaching 'forever running' point. > >> > >> I am going to reset the host in a couple of hours, so any timed ideas > >> for debugging this state will be very appreciated. > > > > A couple of shots at figuring out what the process is doing: > > > > cat /proc/$PID/stack > > cat /proc/$PID/syscall > > gdb $PID > > (gdb) thread apply all bt >=20 > Thanks Stefan, >=20 > of course any attempts to attach to the process or dump core failed at > very beginning. I compared proc contents with live VM and found > nothing suspicious. The question is about what I should try to do > facing supposedly kernel bug, if no possibility to determine which > code is currently executing by emulator is available. Also if it may > help, both affected VMs on different hosts has a simular process > uptime (from end of May). Just to repeat - the process is not reacting > to any signal, have zero CPU consumption immediately after bug > appearance and therefore cannot be stopped/frozen. What did cat /proc/$PID/stack and cat /proc/$PID/syscall output? Stefan --bAmEntskrkuBymla Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUIobdAAoJEJykq7OBq3PIvNkH/3tP/CMseKaYOOsOFkJTxYWq EHo5KrYyWXm7G6yGsQBZxSJnxF/voT/QZEXC4mfvwnhble1jdQiNEKEW9p0Criop fhp+oStyWhH2CBrgxEUmWPn6Wfqv3NYl/8N6Mj1laF/JVC3tQw5XtNPN/Rm/owcY y66M7v6Esxofxrh0H/aw3w6RabysaVSaWwnEPNI+Oqt6rHOJ90Th15UyotgsPxgE gAPbOfjk4/N40hFgmDHGiKM9UsMhPI9ixn2LDjce/jfGcRguQhA5SDrafhL0S58N l+Yxx0fb7UKEuIECoBtIYJXXK7pkwxsWa75+TrZTPm7CinIY6FDXNQmUbHS4YK0= =QNWZ -----END PGP SIGNATURE----- --bAmEntskrkuBymla--