From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48789) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7Tam-0003DR-Ax for qemu-devel@nongnu.org; Thu, 08 Aug 2013 13:01:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V7Tae-0007tX-Hy for qemu-devel@nongnu.org; Thu, 08 Aug 2013 13:01:39 -0400 Received: from mail-pa0-f49.google.com ([209.85.220.49]:46141) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V7Tae-0007t1-9H for qemu-devel@nongnu.org; Thu, 08 Aug 2013 13:01:32 -0400 Received: by mail-pa0-f49.google.com with SMTP id bi5so3767413pad.22 for ; Thu, 08 Aug 2013 10:01:30 -0700 (PDT) Message-ID: <5203CEE4.7040901@inktank.com> Date: Thu, 08 Aug 2013 10:01:24 -0700 From: Josh Durgin MIME-Version: 1.0 References: <51FB887F.5070908@filoo.de> <51FC2903.3030802@cloudapt.com> <5739DFCB-21A5-4AED-82BF-6B58D3E1502A@filoo.de> <20130805074835.GA12658@stefanha-thinkpad.muc.redhat.com> <520391D1.7070704@filoo.de> In-Reply-To: <520391D1.7070704@filoo.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686] List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Oliver Francke Cc: ceph-users@lists.ceph.com, Mike Dawson , Stefan Hajnoczi , "qemu-devel@nongnu.org" On 08/08/2013 05:40 AM, Oliver Francke wrote: > Hi Josh, > > I have a session logged with: > > debug_ms=1:debug_rbd=20:debug_objectcacher=30 > > as you requested from Mike, even if I think, we do have another story > here, anyway. > > Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is > 3.2.0-51-amd... > > Do you want me to open a ticket for that stuff? I have about 5MB > compressed logfile waiting for you ;) Yes, that'd be great. If you could include the time when you saw the guest hang that'd be ideal. I'm not sure if this is one or two bugs, but it seems likely it's a bug in rbd and not qemu. Thanks! Josh > Thnx in advance, > > Oliver. > > On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote: >> On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: >>> Am 02.08.2013 um 23:47 schrieb Mike Dawson : >>>> We can "un-wedge" the guest by opening a NoVNC session or running a >>>> 'virsh screenshot' command. After that, the guest resumes and runs >>>> as expected. At that point we can examine the guest. Each time we'll >>>> see: >> If virsh screenshot works then this confirms that QEMU itself is still >> responding. Its main loop cannot be blocked since it was able to >> process the screendump command. >> >> This supports Josh's theory that a callback is not being invoked. The >> virtio-blk I/O request would be left in a pending state. >> >> Now here is where the behavior varies between configurations: >> >> On a Windows guest with 1 vCPU, you may see the symptom that the guest no >> longer responds to ping. >> >> On a Linux guest with multiple vCPUs, you may see the hung task message >> from the guest kernel because other vCPUs are still making progress. >> Just the vCPU that issued the I/O request and whose task is in >> UNINTERRUPTIBLE state would really be stuck. >> >> Basically, the symptoms depend not just on how QEMU is behaving but also >> on the guest kernel and how many vCPUs you have configured. >> >> I think this can explain how both problems you are observing, Oliver and >> Mike, are a result of the same bug. At least I hope they are :). >> >> Stefan > >