From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51631) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajHOt-0006ct-Up for qemu-devel@nongnu.org; Thu, 24 Mar 2016 22:23:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ajHOs-0001Iq-WA for qemu-devel@nongnu.org; Thu, 24 Mar 2016 22:22:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37444) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajHOs-0001Im-QN for qemu-devel@nongnu.org; Thu, 24 Mar 2016 22:22:58 -0400 Date: Thu, 24 Mar 2016 20:22:55 -0600 From: Alex Williamson Message-ID: <20160324202255.4a397ca8@ul30vt.home> In-Reply-To: <56F49681.8040101@cn.fujitsu.com> References: <1458727927-15082-1-git-send-email-caoj.fnst@cn.fujitsu.com> <1458727927-15082-12-git-send-email-caoj.fnst@cn.fujitsu.com> <20160324165428.68bbcc96@t450s.home> <56F49681.8040101@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doing aer recovery List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chen Fan Cc: izumi.taku@jp.fujitsu.com, Cao jin , qemu-devel@nongnu.org, mst@redhat.com On Fri, 25 Mar 2016 09:38:09 +0800 Chen Fan wrote: > On 03/25/2016 06:54 AM, Alex Williamson wrote: > > On Wed, 23 Mar 2016 18:12:06 +0800 > > Cao jin wrote: > > > >> From: Chen Fan > >> > >> when a physical device aer occurred, the device state probably > >> is not in D0 in a short time, if we recover the device quickly. > >> we may stuck in D3 state when force to change device state to D0. > >> we may need to wait for a short time to inject the error to guest. > >> > >> Signed-off-by: Chen Fan > >> --- > >> hw/vfio/pci.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index 25fc095..5216e7f 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -2658,6 +2658,9 @@ static void vfio_err_notifier_handler(void *opaque) > >> msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : > >> PCI_ERR_ROOT_CMD_NONFATAL_EN; > >> > >> + /* wait a bit to ensure aer device is ready */ > >> + usleep(2 * 1000); > > Where does this number come from? Why would the device be in D3? I > > don't understand this at all. > Hi Alex, > > when I tested the code in my environment, I found that when I used > the aer-inject module to inject a fake aer error to device on host, the qemu > would throw out the message "vfio: Unable to power on device, stuck in D3" > on and off. if I use "gdb" to debug the vfio_pci_pre_reset, the phenomenon > would not appearance, I just thought it should be some timing race issue, > so I use a sleep() to wait 2ms (double the reset time of 1ms) to ensure the > device state is ready. maybe the root reason still need to be > investigated deeply. Yes, it sounds like you need to investigate this further, the delay is arbitrary and perhaps suggests a race that needs to be fixed correctly. Thanks, Alex