All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<izumi.taku@jp.fujitsu.com>, <mst@redhat.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Mon, 5 Dec 2016 13:52:03 +0800	[thread overview]
Message-ID: <58450083.9010201@cn.fujitsu.com> (raw)
In-Reply-To: <20161204083047.7e715b09@t450s.home>



On 12/04/2016 11:30 PM, Alex Williamson wrote:
> On Sun, 4 Dec 2016 20:16:42 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> On 12/01/2016 10:55 PM, Alex Williamson wrote:
>>> On Thu, 1 Dec 2016 21:40:00 +0800  
>>
>>>>> If an AER fault occurs and the user doesn't do a reset, what
>>>>> happens when that device is released and a host driver tries to make
>>>>> use of it?  The user makes no commitment to do a reset and there are
>>>>> only limited configurations where we even allow the user to perform a
>>>>> reset.
>>>>>     
>>>>
>>>> Limited? Do you mean the things __pci_dev_reset() can do?  
>>>
>>> I mean that there are significant device and guest configuration
>>> restrictions in order to support AER.  For instance, all the functions
>>> of the slot need to appear in a PCI-e topology in the guest with all
>>> the functions in the right place such that a guest bus reset translates
>>> into a host bus reset.  The physical functions cannot be split between
>>> guests even if IOMMU isolation would otherwise allow it.  The user
>>> needs to explicitly enable AER support for the devices.  A VM need to
>>> be specifically configured for AER support in order to set any sort of
>>> expectations of a guest directed bus reset, let alone a guarantee that
>>> it will happen.  So all the existing VMs, where functions are split
>>> between guests, or the topology isn't exactly right, or AER isn't
>>> enabled see a regression from the above change as the device is no
>>> longer reset.
>>>   
>>
>> I am not clear why set these restrictions in the current design. I take
>> a glance at older versions of qemu's patchset, their thoughts is:
>> translate a guest bus reset into a host bus reset(Which is
>> unreasonable[*] to me). And I guess, that's the *cause* of these
>> restrictions?  Is there any other stories behind these restrictions?
>>
>> [*] In physical world, set bridge's secondary bus reset would send
>> hot-reset TLP to all functions below, trigger every device's reset
>> separately. Emulated device should behave the same, means just using
>> each device's DeviceClass->reset method.
> 
> Are you trying to say that an FLR is equivalent to a link reset?

No.  Look at old versions patchset, there is one names "vote the
function 0 to do host bus reset when aer occurred"[1], that is what I
called "translate guest link reset to host link reset", and what I think
unreasonable(and I think it also does it wrongly).  So in v10 version of
mine, I dropped it.

[1]https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg02987.html

If "translate guest link reset to host link reset" is right, I can
understand these restrictions[2][3].

[2]. All physical functions in a single card must be assigned to the VM
     with AER enabled on each and configured on the same virtual bus.
[3]. Don't place other devices under the virtual bus in [2], no matter
     physical, emulated, or paravirtual, even if other device
     supporting AER signaling

Certain device's FLR calls its DeviceClass->reset method; link reset
calls DeviceClass->reset of each device which on the bus. So, apparently
they have difference.  But if there is only 1 vfio-pci device under the
virtual pci bus,  I think FLR can be equivalent to a link reset, right?

> Please go read the previous discussions, especially if you're sending
> patches you don't believe in.  Thanks,
> 

I does not read ALL version's discussion thoroughly, but these
restrictions exist for a long time, so I guess it is a result of
previous discussions.  If it is not, I am thinking of the possibility of
dropping these restrictions[2][3], and drop the "aer" property,
automatically enable this functionality or not according to device's
capability.

-- 
Sincerely,
Cao jin



  reply	other threads:[~2016-12-05  5:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin
2016-12-01 13:40     ` Cao jin
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin [this message]
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58450083.9010201@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.