From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35831) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7PWU-0001Lt-4G for qemu-devel@nongnu.org; Thu, 08 Aug 2013 08:41:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V7PWO-00066X-S2 for qemu-devel@nongnu.org; Thu, 08 Aug 2013 08:40:58 -0400 Received: from mail-1.de-punkt.de ([2a00:12c0:1:64::5dbe:40ed]:56744) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7PWO-00065k-GS for qemu-devel@nongnu.org; Thu, 08 Aug 2013 08:40:52 -0400 Message-ID: <520391D1.7070704@filoo.de> Date: Thu, 08 Aug 2013 14:40:49 +0200 From: Oliver Francke MIME-Version: 1.0 References: <51FB887F.5070908@filoo.de> <51FC2903.3030802@cloudapt.com> <5739DFCB-21A5-4AED-82BF-6B58D3E1502A@filoo.de> <20130805074835.GA12658@stefanha-thinkpad.muc.redhat.com> In-Reply-To: <20130805074835.GA12658@stefanha-thinkpad.muc.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686] List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Josh Durgin , ceph-users@lists.ceph.com, Mike Dawson , "qemu-devel@nongnu.org" Hi Josh, I have a session logged with: debug_ms=3D1:debug_rbd=3D20:debug_objectcacher=3D30 as you requested from Mike, even if I think, we do have another story=20 here, anyway. Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is=20 3.2.0-51-amd... Do you want me to open a ticket for that stuff? I have about 5MB=20 compressed logfile waiting for you ;) Thnx in advance, Oliver. On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote: > On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: >> Am 02.08.2013 um 23:47 schrieb Mike Dawson : >>> We can "un-wedge" the guest by opening a NoVNC session or running a '= virsh screenshot' command. After that, the guest resumes and runs as expe= cted. At that point we can examine the guest. Each time we'll see: > If virsh screenshot works then this confirms that QEMU itself is still > responding. Its main loop cannot be blocked since it was able to > process the screendump command. > > This supports Josh's theory that a callback is not being invoked. The > virtio-blk I/O request would be left in a pending state. > > Now here is where the behavior varies between configurations: > > On a Windows guest with 1 vCPU, you may see the symptom that the guest = no > longer responds to ping. > > On a Linux guest with multiple vCPUs, you may see the hung task message > from the guest kernel because other vCPUs are still making progress. > Just the vCPU that issued the I/O request and whose task is in > UNINTERRUPTIBLE state would really be stuck. > > Basically, the symptoms depend not just on how QEMU is behaving but als= o > on the guest kernel and how many vCPUs you have configured. > > I think this can explain how both problems you are observing, Oliver an= d > Mike, are a result of the same bug. At least I hope they are :). > > Stefan --=20 Oliver Francke filoo GmbH Moltkestra=DFe 25a 33330 G=FCtersloh HRB4355 AG G=FCtersloh Gesch=E4ftsf=FChrer: J.Rehp=F6hler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh