Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Collin Walling <walling@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>,
	pmorel@linux.ibm.com, Cornelia Huck <cohuck@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>,
	qemu-s390x@nongnu.org, qemu-devel@nongnu.org,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
Date: Tue, 29 Jan 2019 13:42:31 -0500	[thread overview]
Message-ID: <aa80eac9-ed9d-b6c2-0e71-eea157c3f23e@linux.ibm.com> (raw)
In-Reply-To: <eecbc9c1-cae8-f5d6-32be-8bf71436ac2f@redhat.com>

On 1/29/19 1:20 PM, David Hildenbrand wrote:
> On 29.01.19 17:50, Pierre Morel wrote:
>> On 29/01/2019 16:11, David Hildenbrand wrote:
>>> On 29.01.19 14:50, Pierre Morel wrote:
>>>> On 29/01/2019 11:24, David Hildenbrand wrote:
>>>>>>>> I'm wondering what the architecture says regarding those events -- can
>>>>>>>> someone with access to the documentation comment?
>>>>>>>
>>>>>>> Ping. Any comments from the IBM folks?
>>>>
>>>> Hi,
>>>>
>>>> Sorry to have wait so long.
>>>> At least Collin was faster.
>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> So the idea here is that if we have a PCI device that is the process of
>>>>>> being deconfigured and we are also in the middle of a reset, then let's
>>>>>> accelerate deconfiguring of the PCI device during the reset. Makes sense.
>>>>
>>>> to me too.
>>>> However, how do we ensure that the guest got time to respond to the
>>>> first deconfigure request?
>>>
>>> 30 seconds, then the reboot. On a reboot, I don't see why we should give
>>> a guest more time. "It's dead", rip out the card as the guest refused to
>>> hand it back. (maybe it crashed! but after a reboot the guest state is
>>> reset and baught back to life)
>>
>> I agree that 30 seconds is way enough,
>> also agree that in most cases the guest is dead.
>>
>>
>>>
>>>>
>>>>>>
>>>>>> Note:
>>>>>>
>>>>>> The callback function will deconfigure the the device and put it into
>>>>>> standby mode. However, a PCI device should only go into standby from the
>>>>>> *disabled state* (which it could already be in due to the unplug
>>>>>> sequence), or from a *permanent error state* (something we should
>>>>>> hopefully never see -- this means something went seriously wrong with
>>>>>> the device).
>>>>
>>>> Not completely exact, the CHSC event 0x304, on the initiative of the
>>>> host, force the "deconfigure state" from "configure state" generally,
>>>> whatever sub-state it has (enabled/disabled/error...).
>>>>
>>>>>
>>>>> Right, this should already have been checked before setting up the timer.
>>>>
>>>> Apropos timer, do we need a timer or wouldn't it be better to use a
>>>> delay / a timer + condition?
>>>
>>> I don't think we need a timer at all.
>>
>> Yes, if it is possible to wait synchronously 30s it seems good to me.
> 
> I mean, we don't have to wait 30 seconds at all.
> 
> 1. We send a request to the guest
> 2. It responds (after some seconds), letting go of the zPCI device
> 3. We unplug the device
> 
> 1. We send a request to the guest
> 2. Guest does not respond, request keeps pending forever
> 3. On reboot, unplug the device
> 
> This is how x86/ACPI handles it.
> 
>>
>>>
>>>>
>>>> AFAIU we get out of the unplug without waiting for any answer from the
>>>> guest and we surely get the timer triggering after the reset has been done.
>>>> That seems bad.
>>>
>>> This is the case right now, correct.
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Two things I'm concerned about:
>>>>>>
>>>>>> 1)
>>>>>>
>>>>>> What I would suggest is adding a check for the pbdev->state for
>>>>>> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
>>>>>> states, then we're safe to deconfigure and put into standby. If the
>>>>>
>>>>> We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.
>>>>>
>>>>> So for
>>>>> - ZPCI_FS_DISABLED
>>>>> - ZPCI_FS_ENABLED
>>>>> - ZPCI_FS_BLOCKED
>>>>> - ZPCI_FS_ERROR
>>>>> - ZPCI_FS_PERMANENT_ERROR
>>>>>
>>>>> We setup a timer and simply go ahead and unplug the device when the
>>>>> timer expires ("forced unplug").
>>>>
>>>> I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
>>>> states?
>>>> ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
>>>> other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
>>>> and we do not need a timer (or any delay)
>>>
>>> You can always expect that your guest driver is dead.
>>
>> hum, the device is dead.
>> May be the guest got hit too if the driver is not right written.
>>
>>>
>>>>
>>>>>
>>>>> Changing that behavior might be more invasive. Simply not unplugging in
>>>>> s390_pcihost_timer_cb() on some of these states would mean that somebody
>>>>> issued a request and that requests is simply lost/ignored. Not sure if
>>>>> that is what we want. I think we need separate patches to change
>>>>> something like that. Especially
>>>>>
>>>>> 1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
>>>>> the unplug request and moves the device to ZPCI_FS_ENABLED before the
>>>>> timer expires? These are corner cases to consider.
>>>>
>>>> +1, we must ensure to do the work inside the unplug CB.
>>>>
>>>>>
>>>>> 2. Do we need a timer at all? Now that Patch #1 introduces
>>>>> unplug_requests, we are free to ignore requests from the user if the
>>>>> guest is not reacting. I would really favor getting rid of the timer
>>>>> completely. Was there a special reason why this was introduced?
>>>>
>>>> Yes, to let a chance to the guest to smoothly relinquish the device.
>>>> (for example sync/clean the disk)
>>>> However I do not think it is right implemented.
>>>>
>>>>>
>>>>> No other architecture (e.g. ACPI) uses such a timer. They use a simple
>>>>> flag to remember if a request is pending. I would really favor going
>>>>> into that direction.
>>>>
>>>> I am not sure that the Intel architecture is a good example. :)
>>>
>>> Right, we all learned that zPCI did it better. (sorry ;) )
>>
>> Well I really think so.
>> It is designed with several guest in parallel and shared devices.
>>
>> In such an architecture, ripping of a device from a guest may have interest.
>> One good thing would be that the software of the guest handle it :)
> 
> That is indeed true, but I think such a forced removal also works on
> x86. Theoretically. ("physically rip out the card"). See below.
> 
>>
>>>
>>>>
>>>> AFAIU they do not wait for the guest to have relinquish the device.
>>>> Or do they?
>>>> How long do they wait?
>>>
>>> They wait for ever. And I guess we should do the same thing. If the
>>> guest driver is broken (and this is really a rare scenario!) we would
>>> not get the device back. Which is perfectly fine in my point of view. In
>>> all other scenarios I guess the guest will simply respond in a timely
>>> manner. And ripping out stuff from the guest always feels wrong. (yes
>>> the channel subsystem works like that, but here we actually have a choice)
>>>
>>> If we reboot, we can unplug the device. Otherwise, let's keep it simply
>>> and don't use a timer.
>>>
>>> Thanks!
>>>

This makes more sense to me. If something goes wrong with unplugging a device,
and we use a timeout for forcefully unplug the device... it will never be
apparent to the guest or user that something went wrong in the first place!

>>
>> AFAIK We have no plan to operate on pools of PCI devices so for me I 
>> have no objection to keep it simple:
> 
> Especially one note:
> 
> There seems to be demand for a so called "forced PCI removal" also on
> other architectures. However, this would than rather most probably be
> modeled on top of what we have right now.
> 
> E.g. instead of "device_del XXX" which would request the guest to let go
> of the device, there could be something like "device_del XXX,forced=true".
> 
> E.g. ask the guest. If it does not respond after some time, force remove
> it. This is basically the timer, but managed by a different level, of
> software. And you can than actually decide if you want to do eventually
> harm to the guest OS.
> 
> Are there any other objects of getting rid of the timer?
> 
> Conny could pick up patch #1 once you get an ACK. I would send more
> patches to drop the timer and rework this patch.
> 
> Thanks Pierre!
> 
>>
>> Regards,
>> Pierre
>>
>>
>>
> 
> 

David, Pierre: good discussion.

I'm in favor of dropping the timer, especially with the notes and examples 
above. I think the PCI code will look a lot cleaner / easier-to-maintain 
without it as well.

-- 
Respectfully,
- Collin Walling

     prev parent reply	other threads:[~2019-01-29 18:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-21 13:42 [Qemu-devel] [PATCH v3 0/2] s390x/pci: hotplug handler fixes and reworks David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 1/2] s390x/pci: Introduce unplug requests and split unplug handler David Hildenbrand
2019-01-23 11:03   ` Cornelia Huck
2019-01-23 11:08     ` David Hildenbrand
2019-01-28 11:27       ` Cornelia Huck
2019-01-29 13:31   ` Pierre Morel
2019-01-29 15:14     ` David Hildenbrand
2019-01-29 16:54       ` Pierre Morel
2019-01-29 20:27         ` David Hildenbrand
2019-01-30 19:52   ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-31  9:31     ` David Hildenbrand
2019-01-21 13:42 ` [Qemu-devel] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset David Hildenbrand
2019-01-23 11:05   ` Cornelia Huck
2019-01-28 11:28     ` Cornelia Huck
2019-01-29  0:09       ` [Qemu-devel] [qemu-s390x] " Collin Walling
2019-01-29 10:24         ` David Hildenbrand
2019-01-29 13:50           ` Pierre Morel
2019-01-29 15:11             ` David Hildenbrand
2019-01-29 16:50               ` Pierre Morel
2019-01-29 18:20                 ` David Hildenbrand
2019-01-29 18:37                   ` Cornelia Huck
2019-01-29 18:42                   ` Collin Walling [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa80eac9-ed9d-b6c2-0e71-eea157c3f23e@linux.ibm.com \
    --to=walling@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=pmorel@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).