From: Pierre Morel <pmorel@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>,
Collin Walling <walling@linux.ibm.com>,
Cornelia Huck <cohuck@redhat.com>,
Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>,
qemu-s390x@nongnu.org, qemu-devel@nongnu.org,
Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
Date: Tue, 29 Jan 2019 17:50:25 +0100 [thread overview]
Message-ID: <f9959c97-463e-9681-46ef-19ea728c2025@linux.ibm.com> (raw)
In-Reply-To: <accf8243-1100-117b-9bae-899537bf433c@redhat.com>
On 29/01/2019 16:11, David Hildenbrand wrote:
> On 29.01.19 14:50, Pierre Morel wrote:
>> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>>> someone with access to the documentation comment?
>>>>>
>>>>> Ping. Any comments from the IBM folks?
>>
>> Hi,
>>
>> Sorry to have wait so long.
>> At least Collin was faster.
>>
>>>>>
>>>>
>>>>
>>>> So the idea here is that if we have a PCI device that is the process of
>>>> being deconfigured and we are also in the middle of a reset, then let's
>>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
>>
>> to me too.
>> However, how do we ensure that the guest got time to respond to the
>> first deconfigure request?
>
> 30 seconds, then the reboot. On a reboot, I don't see why we should give
> a guest more time. "It's dead", rip out the card as the guest refused to
> hand it back. (maybe it crashed! but after a reboot the guest state is
> reset and baught back to life)
I agree that 30 seconds is way enough,
also agree that in most cases the guest is dead.
>
>>
>>>>
>>>> Note:
>>>>
>>>> The callback function will deconfigure the the device and put it into
>>>> standby mode. However, a PCI device should only go into standby from the
>>>> *disabled state* (which it could already be in due to the unplug
>>>> sequence), or from a *permanent error state* (something we should
>>>> hopefully never see -- this means something went seriously wrong with
>>>> the device).
>>
>> Not completely exact, the CHSC event 0x304, on the initiative of the
>> host, force the "deconfigure state" from "configure state" generally,
>> whatever sub-state it has (enabled/disabled/error...).
>>
>>>
>>> Right, this should already have been checked before setting up the timer.
>>
>> Apropos timer, do we need a timer or wouldn't it be better to use a
>> delay / a timer + condition?
>
> I don't think we need a timer at all.
Yes, if it is possible to wait synchronously 30s it seems good to me.
>
>>
>> AFAIU we get out of the unplug without waiting for any answer from the
>> guest and we surely get the timer triggering after the reset has been done.
>> That seems bad.
>
> This is the case right now, correct.
>
>>
>>
>>>
>>>>
>>>> Two things I'm concerned about:
>>>>
>>>> 1)
>>>>
>>>> What I would suggest is adding a check for the pbdev->state for
>>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>>> states, then we're safe to deconfigure and put into standby. If the
>>>
>>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>>
>>> So for
>>> - ZPCI_FS_DISABLED
>>> - ZPCI_FS_ENABLED
>>> - ZPCI_FS_BLOCKED
>>> - ZPCI_FS_ERROR
>>> - ZPCI_FS_PERMANENT_ERROR
>>>
>>> We setup a timer and simply go ahead and unplug the device when the
>>> timer expires ("forced unplug").
>>
>> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
>> states?
>> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
>> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
>> and we do not need a timer (or any delay)
>
> You can always expect that your guest driver is dead.
hum, the device is dead.
May be the guest got hit too if the driver is not right written.
>
>>
>>>
>>> Changing that behavior might be more invasive. Simply not unplugging in
>>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>>> issued a request and that requests is simply lost/ignored. Not sure if
>>> that is what we want. I think we need separate patches to change
>>> something like that. Especially
>>>
>>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>>> timer expires? These are corner cases to consider.
>>
>> +1, we must ensure to do the work inside the unplug CB.
>>
>>>
>>> 2. Do we need a timer at all? Now that Patch #1 introduces
>>> unplug_requests, we are free to ignore requests from the user if the
>>> guest is not reacting. I would really favor getting rid of the timer
>>> completely. Was there a special reason why this was introduced?
>>
>> Yes, to let a chance to the guest to smoothly relinquish the device.
>> (for example sync/clean the disk)
>> However I do not think it is right implemented.
>>
>>>
>>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>>> flag to remember if a request is pending. I would really favor going
>>> into that direction.
>>
>> I am not sure that the Intel architecture is a good example. :)
>
> Right, we all learned that zPCI did it better. (sorry ;) )
Well I really think so.
It is designed with several guest in parallel and shared devices.
In such an architecture, ripping of a device from a guest may have interest.
One good thing would be that the software of the guest handle it :)
>
>>
>> AFAIU they do not wait for the guest to have relinquish the device.
>> Or do they?
>> How long do they wait?
>
> They wait for ever. And I guess we should do the same thing. If the
> guest driver is broken (and this is really a rare scenario!) we would
> not get the device back. Which is perfectly fine in my point of view. In
> all other scenarios I guess the guest will simply respond in a timely
> manner. And ripping out stuff from the guest always feels wrong. (yes
> the channel subsystem works like that, but here we actually have a choice)
>
> If we reboot, we can unplug the device. Otherwise, let's keep it simply
> and don't use a timer.
>
> Thanks!
>
AFAIK We have no plan to operate on pools of PCI devices so for me I
have no objection to keep it simple:
Regards,
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
next prev parent reply other threads:[~2019-01-29 16:50 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-21 13:42 [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
2019-01-23 11:03 ` Cornelia Huck
2019-01-23 11:08 ` David Hildenbrand
2019-01-28 11:27 ` Cornelia Huck
2019-01-29 13:31 ` Pierre Morel
2019-01-29 15:14 ` David Hildenbrand
2019-01-29 16:54 ` Pierre Morel
2019-01-29 20:27 ` David Hildenbrand
2019-01-30 19:52 ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-31 9:31 ` David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
2019-01-23 11:05 ` Cornelia Huck
2019-01-28 11:28 ` Cornelia Huck
2019-01-29 0:09 ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-29 10:24 ` David Hildenbrand
2019-01-29 13:50 ` Pierre Morel
2019-01-29 15:11 ` David Hildenbrand
2019-01-29 16:50 ` Pierre Morel [this message]
2019-01-29 18:20 ` David Hildenbrand
2019-01-29 18:37 ` Cornelia Huck
2019-01-29 18:42 ` Collin Walling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f9959c97-463e-9681-46ef-19ea728c2025@linux.ibm.com \
--to=pmorel@linux.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=rth@twiddle.net \
--cc=thuth@redhat.com \
--cc=walling@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).