qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: izumi.taku@jp.fujitsu.com, Cao jin <caoj.fnst@cn.fujitsu.com>,
	qemu-devel@nongnu.org, mst@redhat.com
Subject: Re: [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doing aer recovery
Date: Fri, 1 Apr 2016 09:53:40 +0800	[thread overview]
Message-ID: <56FDD4A4.2040703@cn.fujitsu.com> (raw)
In-Reply-To: <56FDD17B.1090304@cn.fujitsu.com>


On 04/01/2016 09:40 AM, Chen Fan wrote:
>
> On 03/31/2016 11:44 PM, Alex Williamson wrote:
>> On Thu, 31 Mar 2016 14:55:07 +0800
>> Chen Fan <chen.fan.fnst@cn.fujitsu.com> wrote:
>>
>>> On 03/25/2016 10:22 AM, Alex Williamson wrote:
>>>> On Fri, 25 Mar 2016 09:38:09 +0800
>>>> Chen Fan <chen.fan.fnst@cn.fujitsu.com> wrote:
>>>>> On 03/25/2016 06:54 AM, Alex Williamson wrote:
>>>>>> On Wed, 23 Mar 2016 18:12:06 +0800
>>>>>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>>>>>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>>>>>
>>>>>>> when a physical device aer occurred, the device state probably
>>>>>>> is not in D0 in a short time, if we recover the device quickly.
>>>>>>> we may stuck in D3 state when force to change device state to D0.
>>>>>>> we may need to wait for a short time to inject the error to guest.
>>>>>>>
>>>>>>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>>>>> ---
>>>>>>>     hw/vfio/pci.c | 3 +++
>>>>>>>     1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>>> index 25fc095..5216e7f 100644
>>>>>>> --- a/hw/vfio/pci.c
>>>>>>> +++ b/hw/vfio/pci.c
>>>>>>> @@ -2658,6 +2658,9 @@ static void vfio_err_notifier_handler(void 
>>>>>>> *opaque)
>>>>>>>             msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>>>>>>> PCI_ERR_ROOT_CMD_NONFATAL_EN;
>>>>>>>     +        /* wait a bit to ensure aer device is ready */
>>>>>>> +        usleep(2 * 1000);
>>>>>> Where does this number come from?  Why would the device be in D3?  I
>>>>>> don't understand this at all.
>>>>> Hi Alex,
>>>>>
>>>>>        when I tested the code in my environment, I found that when 
>>>>> I used
>>>>> the aer-inject module to inject a fake aer error to device on 
>>>>> host, the qemu
>>>>> would throw out the message "vfio: Unable to power on device, 
>>>>> stuck in D3"
>>>>> on and off. if I use "gdb" to debug the vfio_pci_pre_reset, the 
>>>>> phenomenon
>>>>> would not appearance, I just thought it should be some timing race 
>>>>> issue,
>>>>> so I use a sleep() to wait 2ms (double the reset time of 1ms) to 
>>>>> ensure the
>>>>> device state is ready. maybe the root reason still need to be
>>>>> investigated deeply.
>>>> Yes, it sounds like you need to investigate this further, the delay is
>>>> arbitrary and perhaps suggests a race that needs to be fixed
>>>> correctly.  Thanks,
>>> Hi Alex,
>>>
>>>       after done some investigation of the problem, I found that only
>>> when the injected
>>>    error is fatal, the problem will appear. because in aer do_recovery,
>>> host will call reset_link
>>> on the root port, which would invoke pci_reset_bridge_secondary_bus in
>>> aer_root_reset,
>>> that would reset the bridge and all the device under that. so when qemu
>>> receive the aer
>>> notification, then propagate the error to guest, guest does the same 
>>> way
>>> to perform the
>>> recovery, if the guest `reset_link` that will call the vfio_pre_reset
>>> done at the stage of host
>>> bridge reset, the device status would probable stick in D3.
>>>
>>> so I think after qemu receive the aer notification, we should wait for
>>> enough time to
>>> ensure the bridge has been reset completely. I just use sleep <=10ms to
>>> test the code,
>>> seems still appear the message "vfio: Unable to power on device, stuck
>>> in D3". so I think
>>> we should sleep 100ms to ensure the delay sufficient. I have tested 
>>> that
>>> code 100+ times
>>> by inject aer error. the issue no longer appears.
>> I'm not satisfied with this.  pci_reset_bridge_secondary_bus() is
>> invoked by both the host AER code and the guest AER code, the latter
>> via the vfio PCI hot reset interface.  The
>> pci_reset_bridge_secondary_bus() function includes the spec defined
>> delay by which point all the devices should be operational again.  The
>> spec also defines that devices are in D0 after reset, which implies
>> that the only reason we would ever be seeing a device in D3 is if we're
>> reading the device while it is still in reset or before it has
>> recovered from reset.  That implies that either
>> pci_reset_bridge_secondary_bus() is not waiting long enough or QEMU is
>> allowing one device to call vfio_pre_reset while another device is
>> still in reset.  I suspect QEMU serializes reset such that the latter
>> case is not possible, which means that you might have a device that
>> takes longer to reset than the spec defines.  Such a quirk should be
>> handled in the host kernel reset, not by adding arbitrary delays in
>> userspace code.  Thanks,
> maybe it is not the delay issue actually. let me analyze the aer process
> in do_recovery:
>
>                host qemu               guest
>
>         broadcast: error_detected
>         +--> error_detected (VFIO)
>           +--> eventfd_signal   ----->    inject_aer ---------> 
> broadcast: error_detected
> *reset_link* +--> error_detected (real driver)
> *reset_link*
>         broadcast: mmio/slot_reset                                     
> broadcast: mmio/slot_reset
>         broadcast: 
> resume                                                  broadcast: resume
apologies to all, the above diagram was chaotic. I redraw it.

                host                             qemu        guest

broadcast: error_detected (host)
  +--> error_detected (VFIO)
    +--> eventfd_signal   ----->  inject_aer  -----> broadcast: 
error_detected (gust)
*reset_link* (host) +--> error_detected (real driver) (gust)
*reset_link* (gust)
broadcast: mmio/slot_reset (host)                      broadcast: 
mmio/slot_reset (gust)
broadcast: resume (host) broadcast: resume (gust)

Thanks,
Chen

>
> we can see that the reset_link in host and in guest are not called in 
> order. the reset_link
> in guest may execute during the host broadcast "error_detected" or 
> "reset_link" as long
> as there are many devices need to walk on the bus. so in the case qemu 
> has the opportunity
> to call vfio_pre_reset while host is still in reset.
>
> so can we consider moving the eventfd_signal in error_detected to 
> resume/slot_reset callback
> in vfio driver, then we can assure that the host reset_link finish 
> before guest do reset_link.
> or adding a new resume/slot_reset event between vfio driver and qemu 
> to delay the guest reset_link.
>
> Thanks,
> Chen
>
>
>
>
>
>
>
>
>
>
>
>>
>> Alex
>>
>>
>> .
>>
>
>
>
>
> .
>

  reply	other threads:[~2016-04-01  1:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-23 10:11 [Qemu-devel] [patch v5 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-03-23 10:11 ` [Qemu-devel] [patch v5 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-03-23 10:11 ` [Qemu-devel] [patch v5 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-03-23 10:11 ` [Qemu-devel] [patch v5 03/12] vfio: add pcie extended capability support Cao jin
2016-03-23 10:11 ` [Qemu-devel] [patch v5 04/12] vfio: add aer support for vfio device Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 06/12] vfio: add check host bus reset is support or not Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-03-24 22:54   ` Alex Williamson
2016-03-27 12:19     ` Michael S. Tsirkin
2016-03-31  7:23       ` Chen Fan
2016-03-27 12:52   ` Michael S. Tsirkin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-03-23 10:12 ` [Qemu-devel] [patch v5 11/12] vfio: device may stuck in D3 when doing aer recovery Cao jin
2016-03-24 22:54   ` Alex Williamson
2016-03-25  1:38     ` Chen Fan
2016-03-25  2:22       ` Alex Williamson
2016-03-31  6:55         ` Chen Fan
2016-03-31 15:44           ` Alex Williamson
2016-04-01  1:40             ` Chen Fan
2016-04-01  1:53               ` Chen Fan [this message]
2016-03-23 10:12 ` [Qemu-devel] [patch v5 12/12] vfio: add 'aer' property to expose aercap Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FDD4A4.2040703@cn.fujitsu.com \
    --to=chen.fan.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).