From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Wei Liu" <wl@xen.org>,
"George Dunlap" <george.dunlap@citrix.com>,
"Julien Grall" <julien@xen.org>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Paul Durrant" <paul@xen.org>,
"Kevin Tian" <kevin.tian@intel.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev
Date: Mon, 24 Apr 2023 14:15:27 +0000 [thread overview]
Message-ID: <87h6t574e9.fsf@epam.com> (raw)
In-Reply-To: <bb71c3f3-20a7-b020-6685-879bd4e5786d@suse.com>
Hi Jan,
Jan Beulich <jbeulich@suse.com> writes:
> On 21.04.2023 16:13, Volodymyr Babchuk wrote:
>>
>> Hi Roger,
>>
>> Roger Pau Monné <roger.pau@citrix.com> writes:
>>
>>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote:
>>>>
>>>> Hello Roger,
>>>>
>>>> Roger Pau Monné <roger.pau@citrix.com> writes:
>>>>
>>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote:
>>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote:
>>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote:
>>>>>>>> Above I have proposed another view on this. I hope, it will work for
>>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be left
>>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that
>>>>>>>> owners of those refcounts will not try to access the physical PCI
>>>>>>>> device if pci_remove_device() is already finished.
>>>>>>>
>>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an
>>>>>>> opinion. I will like to also hear Jans opinion, since he is the
>>>>>>> maintainer.
>>>>>>
>>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". Whoever
>>>>>> holds a ref is entitled to access the device. As stated before, I see only
>>>>>> two ways of getting things consistent: Either pci_remove_device() is
>>>>>> invoked upon dropping of the last ref,
>>>>>
>>>>> With this approach, what would be the implementation of
>>>>> PHYSDEVOP_manage_pci_remove? Would it just check whether the pdev
>>>>> exist and either return 0 or -EBUSY?
>>>>>
>>>>
>>>> Okay, I am preparing patches with the behavior you proposed. To test it,
>>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove
>>>> returned -EBUSY, which propagated to the linux driver. Problem is that
>>>> Linux driver can't do anything with this. It just displayed an error
>>>> message and removed device anyways. This is because Linux sends
>>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no
>>>> way to prevent the device removal. So, admin is not capable to try this
>>>> again.
>>>
>>> Ideally Linux won't remove the device, and then the admin would get to
>>> retry. Maybe the way the Linux hook is placed is not the best one?
>>> The hypervisor should be authoritative on whether a device can be
>>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an
>>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux
>>> to continue.
>>
>> Yes, it would be ideally, but Linux driver/device model is written in a
>> such way, that PCI subsystem tracks all the PCI device usage, so it can
>> be certain that it can remove the device. Thus, functions in the device
>> removal path either return void or 0. Of course, kernel does not know that
>> hypervisor has additional uses for the device, so there is no mechanisms
>> to prevent removal.
>
> Could pciback obtain a reference on behalf of the hypervisor, dropping it
> when device removal is requested (i.e. much closer to the start of that
> operation), and only if it finds that no guests use the device anymore?
Yes, it can, it this indeed will hold a reference to a pci device for a
time, but there are some consideration that made this approach not
feasible.
Basically, when an user writes to /sys/bus/pci/SBDF/remove, the
following happens:
1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the
operation anymore
[unimportant things]
N. pci_stop_dev() function is called. This function unloads a device
driver. Any good behaving driver should drop all additional references
to a device at this point.
[more unimportant things]
M. PCI subsystem drops own reference to a generic device object
So, as you can see, admin can't restart a "failed" attempt to remove a
PCI device.
On other hand, remove() function can sleep. This allows us to pause
removal process a bit and check if hypervisor had finished removing a
PCI device on its side. But, as you pointed out, this can take weeks...
--
WBR, Volodymyr
next prev parent reply other threads:[~2023-04-24 14:16 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-14 20:56 [PATCH v3 0/6] vpci: first series in preparation for vpci on ARM Volodymyr Babchuk
2023-03-14 20:56 ` [PATCH v3 2/6] xen: pci: introduce reference counting for pdev Volodymyr Babchuk
2023-03-16 16:16 ` Roger Pau Monné
2023-03-29 9:55 ` Jan Beulich
2023-03-29 10:48 ` Roger Pau Monné
2023-03-29 11:58 ` Jan Beulich
2023-04-11 23:41 ` Volodymyr Babchuk
2023-04-12 9:13 ` Roger Pau Monné
2023-04-12 21:54 ` Volodymyr Babchuk
2023-04-13 15:00 ` Roger Pau Monné
2023-04-14 1:30 ` Volodymyr Babchuk
2023-04-17 10:17 ` Roger Pau Monné
2023-04-17 10:34 ` Jan Beulich
2023-04-17 10:51 ` Roger Pau Monné
2023-04-17 11:02 ` Jan Beulich
2023-04-21 11:00 ` Volodymyr Babchuk
2023-04-21 12:24 ` Jan Beulich
2023-04-21 13:02 ` Volodymyr Babchuk
2023-04-21 13:10 ` Roger Pau Monné
2023-04-21 14:13 ` Volodymyr Babchuk
2023-04-24 7:46 ` Jan Beulich
2023-04-24 14:15 ` Volodymyr Babchuk [this message]
2023-04-24 14:27 ` Jan Beulich
2023-03-29 10:04 ` Jan Beulich
2023-03-14 20:56 ` [PATCH v3 1/6] xen: add reference counter support Volodymyr Babchuk
2023-03-16 13:54 ` Roger Pau Monné
2023-03-16 14:03 ` Jan Beulich
2023-03-16 16:21 ` Roger Pau Monné
2023-04-11 22:27 ` Volodymyr Babchuk
2023-04-12 10:12 ` Roger Pau Monné
2023-03-16 16:19 ` Roger Pau Monné
2023-03-16 16:32 ` Jan Beulich
2023-03-16 16:39 ` Roger Pau Monné
2023-03-16 16:43 ` Jan Beulich
2023-03-16 16:48 ` Roger Pau Monné
2023-03-16 16:56 ` Jan Beulich
2023-03-17 10:05 ` Roger Pau Monné
2023-03-17 14:46 ` Jan Beulich
2023-03-16 17:01 ` Jan Beulich
2023-04-11 22:38 ` Volodymyr Babchuk
2023-04-17 6:47 ` Jan Beulich
2023-03-14 20:56 ` [PATCH v3 4/6] vpci: restrict unhandled read/write operations for guests Volodymyr Babchuk
2023-03-17 8:37 ` Roger Pau Monné
2023-03-14 20:56 ` [PATCH v3 6/6] xen: pci: print reference counter when dumping pci_devs Volodymyr Babchuk
2023-03-17 8:46 ` Roger Pau Monné
2023-03-14 20:56 ` [PATCH v3 5/6] vpci: use reference counter to protect vpci state Volodymyr Babchuk
2023-03-17 8:43 ` Roger Pau Monné
2023-03-29 9:31 ` Jan Beulich
2023-03-14 20:56 ` [PATCH v3 3/6] vpci: crash domain if we wasn't able to (un) map vPCI regions Volodymyr Babchuk
2023-03-16 16:32 ` Roger Pau Monné
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h6t574e9.fsf@epam.com \
--to=volodymyr_babchuk@epam.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=kevin.tian@intel.com \
--cc=paul@xen.org \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.