From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46257) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajGmu-0003HN-Fi for qemu-devel@nongnu.org; Thu, 24 Mar 2016 21:43:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ajGmr-00023i-9U for qemu-devel@nongnu.org; Thu, 24 Mar 2016 21:43:44 -0400 Received: from [59.151.112.132] (port=49378 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajGmq-0001wy-Qw for qemu-devel@nongnu.org; Thu, 24 Mar 2016 21:43:41 -0400 Message-ID: <56F49681.8040101@cn.fujitsu.com> Date: Fri, 25 Mar 2016 09:38:09 +0800 From: Chen Fan MIME-Version: 1.0 References: <1458727927-15082-1-git-send-email-caoj.fnst@cn.fujitsu.com> <1458727927-15082-12-git-send-email-caoj.fnst@cn.fujitsu.com> <20160324165428.68bbcc96@t450s.home> In-Reply-To: <20160324165428.68bbcc96@t450s.home> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doing aer recovery List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson , Cao jin Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org, mst@redhat.com On 03/25/2016 06:54 AM, Alex Williamson wrote: > On Wed, 23 Mar 2016 18:12:06 +0800 > Cao jin wrote: > >> From: Chen Fan >> >> when a physical device aer occurred, the device state probably >> is not in D0 in a short time, if we recover the device quickly. >> we may stuck in D3 state when force to change device state to D0. >> we may need to wait for a short time to inject the error to guest. >> >> Signed-off-by: Chen Fan >> --- >> hw/vfio/pci.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index 25fc095..5216e7f 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -2658,6 +2658,9 @@ static void vfio_err_notifier_handler(void *opaque) >> msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : >> PCI_ERR_ROOT_CMD_NONFATAL_EN; >> >> + /* wait a bit to ensure aer device is ready */ >> + usleep(2 * 1000); > Where does this number come from? Why would the device be in D3? I > don't understand this at all. Hi Alex, when I tested the code in my environment, I found that when I used the aer-inject module to inject a fake aer error to device on host, the qemu would throw out the message "vfio: Unable to power on device, stuck in D3" on and off. if I use "gdb" to debug the vfio_pci_pre_reset, the phenomenon would not appearance, I just thought it should be some timing race issue, so I use a sleep() to wait 2ms (double the reset time of 1ms) to ensure the device state is ready. maybe the root reason still need to be investigated deeply. Thanks, Chen > >> + >> pcie_aer_msg(dev, &msg); >> return; >> } > > > . >