From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751260AbcLDPaw (ORCPT ); Sun, 4 Dec 2016 10:30:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35208 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750934AbcLDPas (ORCPT ); Sun, 4 Dec 2016 10:30:48 -0500 Date: Sun, 4 Dec 2016 08:30:47 -0700 From: Alex Williamson To: Cao jin Cc: , , , Subject: Re: [PATCH] vfio/pci: Support error recovery Message-ID: <20161204083047.7e715b09@t450s.home> In-Reply-To: <5844092A.30204@cn.fujitsu.com> References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161130210413.5161aab1@t450s.home> <58402830.3060606@cn.fujitsu.com> <20161201075541.756f6332@t450s.home> <5844092A.30204@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Sun, 04 Dec 2016 15:30:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 4 Dec 2016 20:16:42 +0800 Cao jin wrote: > On 12/01/2016 10:55 PM, Alex Williamson wrote: > > On Thu, 1 Dec 2016 21:40:00 +0800 > > >>> If an AER fault occurs and the user doesn't do a reset, what > >>> happens when that device is released and a host driver tries to make > >>> use of it? The user makes no commitment to do a reset and there are > >>> only limited configurations where we even allow the user to perform a > >>> reset. > >>> > >> > >> Limited? Do you mean the things __pci_dev_reset() can do? > > > > I mean that there are significant device and guest configuration > > restrictions in order to support AER. For instance, all the functions > > of the slot need to appear in a PCI-e topology in the guest with all > > the functions in the right place such that a guest bus reset translates > > into a host bus reset. The physical functions cannot be split between > > guests even if IOMMU isolation would otherwise allow it. The user > > needs to explicitly enable AER support for the devices. A VM need to > > be specifically configured for AER support in order to set any sort of > > expectations of a guest directed bus reset, let alone a guarantee that > > it will happen. So all the existing VMs, where functions are split > > between guests, or the topology isn't exactly right, or AER isn't > > enabled see a regression from the above change as the device is no > > longer reset. > > > > I am not clear why set these restrictions in the current design. I take > a glance at older versions of qemu's patchset, their thoughts is: > translate a guest bus reset into a host bus reset(Which is > unreasonable[*] to me). And I guess, that's the *cause* of these > restrictions? Is there any other stories behind these restrictions? > > [*] In physical world, set bridge's secondary bus reset would send > hot-reset TLP to all functions below, trigger every device's reset > separately. Emulated device should behave the same, means just using > each device's DeviceClass->reset method. Are you trying to say that an FLR is equivalent to a link reset? Please go read the previous discussions, especially if you're sending patches you don't believe in. Thanks, Alex