From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40302) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFBF7-0000kN-Ab for qemu-devel@nongnu.org; Mon, 20 Jun 2016 22:16:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFBF2-0005FW-5M for qemu-devel@nongnu.org; Mon, 20 Jun 2016 22:16:44 -0400 Received: from [59.151.112.132] (port=28490 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFBF1-0005F6-GZ for qemu-devel@nongnu.org; Mon, 20 Jun 2016 22:16:40 -0400 References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com> <20160527100655.60db8206@t450s.home> <30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com> <41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com> <20160620103226.0ff61b21@ul30vt.home> From: Zhou Jie Message-ID: Date: Tue, 21 Jun 2016 10:16:25 +0800 MIME-Version: 1.0 In-Reply-To: <20160620103226.0ff61b21@ul30vt.home> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: fan.chen@easystack.cn, mst@redhat.com, qemu-devel@nongnu.org, caoj.fnst@cn.fujitsu.com, Chen Fan , izumi.taku@jp.fujitsu.com Hi, Alex > I was really hoping to hear your opinion, or at least some further > discussion of pros and cons rather than simply parroting back my idea. I understand. > My current thinking is that a resume notifier to userspace is poorly > defined, it's not clear what the user can and cannot do between an > error notification and the resume notification. Yes, do nothing between that time is better. > One approach to solve > that might be that the kernel internally handles the resume > notifications. Maybe that means blocking the ioctl (interruptible > timeout) until the internal resume occurs, or maybe that means > returning -EAGAIN. I don't think it is a good idea. The kernel give the error and resume notifications, it's enough. It's up to user to how to use them. > Probably implementations of each need to be worked > through to determine which is better. We don't want to add complexity > to the kernel simply to make things easier for userspace, but we also > don't want a poorly specified interface that is difficult for > userspace to use correctly. Thanks, In qemu, the aer recovery process: 1. Detect support for resume notification If host vfio driver does not support for resume notification, directly fail to boot up VM as with aer enabled. 2. Immediately notify the VM on error detected. 3. Disable the device. Unmap the config space and bar region. 4. Delay the guest directed bus reset. 5. Wait for resume notification. If we don't get the resume notification from the host after some timeout, we would abort the guest directed bus reset altogether and unplug of the device to prevent it from further interacting with the VM. 6. After get the resume notification reset bus and enable the device. I think we only make sure the disabled device will not interact with the VM. Sincerely Zhou jie > > Alex > > > . > -- ------------------------------------------------ 周潔 Dept 1 No. 6 Wenzhu Road, Nanjing, 210012, China TEL:+86+25-86630566-8557 FUJITSU INTERNAL:7998-8557 E-Mail:zhoujie2011@cn.fujitsu.com ------------------------------------------------