From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Snow Subject: Infinite IRQ injection loop in QEMU Date: Wed, 22 Oct 2014 11:33:41 -0400 Message-ID: <5447CE55.4020308@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: stefan Hajnoczi To: kvm@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:29161 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752420AbaJVPdn (ORCPT ); Wed, 22 Oct 2014 11:33:43 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s9MFXgv7005530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Wed, 22 Oct 2014 11:33:43 -0400 Sender: kvm-owner@vger.kernel.org List-ID: Hello all; I've been working on improving the AHCI device emulation for QEMU but have recently run into an issue where Windows 8 guests -- upon trying to resume from hibernation -- manage to trigger an infinite IRQ injection loop where it seems that the IRQ never quite properly gets cleared. I am still working on troubleshooting it further, but I wanted to see if anyone had advice or experience with this type of issue. In a nutshell: - Windows 8 boots up inside of QEMU/KVM - Windows 8 is suspended to disk either via "shut down" or explicit hibernate. QEMU exits. - Windows 8 is resumed - Windows 8 resets the AHCI device and begins re-initializing it - Once the active AHCI port is reset, it issues an interrupt to indicate it has a pending message (set of register values) ready for the host to synchronize state with the HBA. This interrupt appears to be legacy PCI and not MSI. - This triggers an infinite injection loop. Here are some characteristic traces from perf record, grabbing kvm-related entries with user space traces. Here's where the interrupt first appears to become stuck, showing when it is set: http://pastebin.com/KPevxCw2 It looks like pin #16, vec=177. All activity in the guest and QEMU now apparently ceases, and then the perf script shows many, many loops which look like the following: http://pastebin.com/qYh9035y which repeats over-and-over. It does not appear that QEMU is re-setting the IRQ, and there are no further calls from the guest into ICH9 or AHCI related code to set/unset any device registers. In talking with Stefan, we think that the irr bit is possibly not getting cleared (or getting set again?) after the EOI (see the first paste) -- does anyone have experience with debugging this type of issue, or have some hints about what may be happening? Thanks in advance for any advice. --John S. (As a post-script: the kernel I am using is the version provided by David Airlie for MST [Multi-Stream Transport] support in Linux, which is still experimental. Sorry for the non-stock kernel! http://airlied.livejournal.com/79657.html)