From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"Oleksandr Andrushchenko" <Oleksandr_Andrushchenko@epam.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Wei Liu" <wl@xen.org>,
"George Dunlap" <george.dunlap@citrix.com>,
"Julien Grall" <julien@xen.org>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Paul Durrant" <paul@xen.org>
Subject: Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Date: Fri, 7 Jul 2023 02:02:52 +0000 [thread overview]
Message-ID: <87mt08mqb9.fsf@epam.com> (raw)
In-Reply-To: <a964cffa-fb42-b0e0-e60a-1044d8794193@suse.com>
Hi Jan,
Jan Beulich <jbeulich@suse.com> writes:
> On 05.07.2023 10:59, Roger Pau Monné wrote:
>> On Wed, Jul 05, 2023 at 09:11:10AM +0200, Jan Beulich wrote:
>>> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>>>> I am currently implementing your proposal (along with Jan's
>>>> suggestions), but I am facing ABBA deadlock with IOMMU's
>>>> reassign_device() call, which has this piece of code:
>>>>
>>>> list_move(&pdev->domain_list, &target->pdev_list);
>>>>
>>>> My immediate change was:
>>>>
>>>> write_lock(&pdev->domain->pci_lock);
>>>> list_del(&pdev->domain_list);
>>>> write_unlock(&pdev->domain->pci_lock);
>>>>
>>>> write_lock(&target->pci_lock);
>>>> list_add(&pdev->domain_list, &target->pdev_list);
>>>> write_unlock(&target->pci_lock);
>>>>
>>>> But this will not work because reassign_device is called from
>>>> pci_release_devices() which iterates over d->pdev_list, so we need to
>>>> take a d->pci_lock early.
>>>>
>>>> Any suggestions on how to fix this? My idea is to remove a device from a
>>>> list one at time:
>>>>
>>>> int pci_release_devices(struct domain *d)
>>>> {
>>>> struct pci_dev *pdev;
>>>> u8 bus, devfn;
>>>> int ret;
>>>>
>>>> pcidevs_lock();
>>>> write_lock(&d->pci_lock);
>>>> ret = arch_pci_clean_pirqs(d);
>>>> if ( ret )
>>>> {
>>>> pcidevs_unlock();
>>>> write_unlock(&d->pci_lock);
>>>> return ret;
>>>> }
>>>>
>>>> while ( !list_empty(&d->pdev_list) )
>>>> {
>>>> pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>>> bus = pdev->bus;
>>>> devfn = pdev->devfn;
>>>> list_del(&pdev->domain_list);
>>>> write_unlock(&d->pci_lock);
>>>> ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>>> write_lock(&d->pci_lock);
>>>
>>> I think it needs doing almost like this, but with two more tweaks and
>>> no list_del() right here (first and foremost to avoid needing to
>>> figure whether removing early isn't going to subtly break anything;
>>> see below for an error case that would end up with changed behavior):
>>>
>>> while ( !list_empty(&d->pdev_list) )
>>> {
>>> const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct pci_dev, domain_list);
>>> uint16_t seg = pdev->seg;
>>> uint8_t bus = pdev->bus;
>>> uint8_t devfn = pdev->devfn;
>>>
>>> write_unlock(&d->pci_lock);
>>
>> I think you need to remove the device from the pdev_list before
>> dropping the lock, or else release could race with other operations.
>>
>> That's unlikely, but still if the lock is dropped and the routine
>> needs to operate on the device it is better remove such device from
>> the domain so other operations cannot get a reference to it.
>>
>> Otherwise you could modify reassign_device() implementations so they
>> require the caller to hold the source->pci_lock when calling the
>> routine, but that's ugly because the lock would need to be dropped in
>> order to reassign the device from source to target domains.
>>
>> Another option would be to move the whole d->pdev_list to a local
>> variable (so that d->pdev_list would be empty) and then iterate over
>> it without the d->pci_lock. On failure you would take the lock and
>> add the failing device back into d->pdev_list.
>
> Conceptually I like this last variant, but like the individual list_del()
> it requires auditing code for no dependency on the device still being on
> that list. In fact deassign_device()'s use of pci_get_pdev() does. The
> function would then need changing to have struct pci_dev * passed in.
> Yet who knows where else there are uses of pci_get_pdev() lurking.
Okay, so I changed deassign_device() signature and reworked the loop
in pci_release_devices() in a such way:
INIT_LIST_HEAD(&tmp_list);
/* Move all entries to tmp_list, so we can drop d->pci_lock */
list_splice_init(&d->pdev_list, &tmp_list);
write_unlock(&d->pci_lock);
list_for_each_entry_safe ( pdev, tmp, &tmp_list, domain_list )
{
pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
rc = deassign_device(d, pdev);
if ( rc )
{
/* Return device back to the domain list */
write_lock(&d->pci_lock);
list_add(&pdev->domain_list, &d->pdev_list);
write_unlock(&d->pci_lock);
func_ret = rc;
}
}
Also, I checked for all pci_get_pdev() calls and found that struct
domain (the first parameter) is passed only in handful of places:
*** xen/drivers/vpci/vpci.c:
vpci_read[504] pdev = pci_get_pdev(d, sbdf);
vpci_read[506] pdev = pci_get_pdev(dom_xen, sbdf);
vpci_write[621] pdev = pci_get_pdev(d, sbdf);
vpci_write[623] pdev = pci_get_pdev(dom_xen, sbdf);
*** xen/arch/x86/irq.c:
map_domain_pirq[2166] pdev = pci_get_pdev(d, msi->sbdf);
*** xen/drivers/passthrough/pci.c:
XEN_GUEST_HANDLE_PARAM[1712] pdev = pci_get_pdev(d, machine_sbdf);
The last one is due to my change to deassign_device() signature.
==============================
d->pdev_list can be accessed there:
*** xen/drivers/passthrough/amd/pci_amd_iommu.c:
reassign_device[489] list_add(&pdev->domain_list, &target->pdev_list);
*** xen/drivers/passthrough/pci.c:
_pci_hide_device[463] list_add(&pdev->domain_list, &dom_xen->pdev_list);
pci_get_pdev[561] list_for_each_entry ( pdev, &d->pdev_list, domain_list )
pci_add_device[759] list_add(&pdev->domain_list, &hardware_domain->pdev_list);
pci_release_devices[917] list_splice_init(&d->pdev_list, &tmp_list);
pci_release_devices[922] pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
pci_release_devices[928] list_add(&pdev->domain_list, &d->pdev_list);
_setup_hwdom_pci_devices[1155] list_add(&pdev->domain_list, &ctxt->d->pdev_list);
*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2819] list_add(&pdev->domain_list, &target->pdev_list);
*** xen/include/xen/pci.h:
for_each_pdev[149] list_for_each_entry(pdev, &(domain)->pdev_list, domain_list)
has_arch_pdevs[151] #define has_arch_pdevs(d) (!list_empty(&(d)->pdev_list))
==============================
And has_arch_pdevs() is used there:
*** xen/arch/x86/hvm/hvm.c:
hvm_set_cr0[2388] has_arch_pdevs(d)) )
*** xen/arch/x86/hvm/vmx/vmcs.c:
vmx_do_resume[1892] if ( has_arch_pdevs(v->domain) && !iommu_snoop
*** xen/arch/x86/mm.c:
l1_disallow_mask[172] !has_arch_pdevs(d) && \
*** xen/arch/x86/mm/p2m-pod.c:
p2m_pod_set_mem_target[352] if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
guest_physmap_mark_populate_on_demand[1404] if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
*** xen/arch/x86/mm/paging.c:
paging_log_dirty_enable[208] if ( has_arch_pdevs(d) )
*** xen/drivers/passthrough/vtd/iommu.c:
reassign_device_ownership[2773] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2807] if ( !has_arch_pdevs(target) )
reassign_device_ownership[2825] if ( !has_arch_pdevs(source) )
has_arch_pdevs() bothers me most, actually, because it is not always
obvious how to add locking for the callers. I am planning to rework it
in the following way:
#define has_arch_pdevs_unlocked(d) (!list_empty(&(d)->pdev_list))
static inline bool has_arch_pdevs(struct domain *d)
{
bool ret;
read_lock(&d->pci_lock);
ret = has_arch_pdevs_unlocked(d);
read_unlock(&d->pci_lock);
return ret;
}
And then use appropriate macro/function depending on circumstances.
--
WBR, Volodymyr
next prev parent reply other threads:[~2023-07-07 2:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-13 10:32 [PATCH v7 00/12] PCI devices passthrough on Arm, part 3 Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure Volodymyr Babchuk
2023-06-16 16:30 ` Roger Pau Monné
2023-06-21 22:07 ` Volodymyr Babchuk
2023-06-22 8:15 ` Roger Pau Monné
2023-06-22 21:17 ` Volodymyr Babchuk
2023-06-23 8:50 ` Roger Pau Monné
2023-06-23 9:26 ` Volodymyr Babchuk
2023-06-23 17:09 ` Jan Beulich
2023-07-04 21:03 ` Volodymyr Babchuk
2023-07-05 7:11 ` Jan Beulich
2023-07-05 8:59 ` Roger Pau Monné
2023-07-05 9:13 ` Jan Beulich
2023-07-07 2:02 ` Volodymyr Babchuk [this message]
2023-07-07 6:32 ` Jan Beulich
2023-07-09 22:41 ` Volodymyr Babchuk
2023-07-10 6:20 ` Jan Beulich
2023-06-20 20:17 ` Stewart Hildebrand
2023-06-22 21:18 ` Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 02/12] vpci: restrict unhandled read/write operations for guests Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 03/12] vpci: add hooks for PCI device assign/de-assign Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 04/12] vpci/header: implement guest BAR register handlers Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 08/12] vpci/header: emulate PCI_COMMAND register for guests Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 05/12] rangeset: add RANGESETF_no_print flag Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 07/12] vpci/header: program p2m with guest BAR view Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 06/12] vpci/header: handle p2m range sets per BAR Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 11/12] xen/arm: translate virtual PCI bus topology for guests Volodymyr Babchuk
2023-06-21 12:01 ` Jan Beulich
2023-06-13 10:32 ` [PATCH v7 09/12] vpci/header: reset the command register when adding devices Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 10/12] vpci: add initial support for virtual PCI bus topology Volodymyr Babchuk
2023-06-21 12:06 ` Jan Beulich
2023-07-07 16:45 ` Oleksandr Tyshchenko
2023-06-13 10:32 ` [PATCH v7 12/12] xen/arm: account IO handlers for emulated PCI MSI-X Volodymyr Babchuk
2023-06-15 2:01 ` [PATCH v7 00/12] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
2023-06-15 9:39 ` Volodymyr Babchuk
2023-06-15 12:16 ` Jan Beulich
2023-06-21 22:11 ` Volodymyr Babchuk
2023-06-22 7:48 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mt08mqb9.fsf@epam.com \
--to=volodymyr_babchuk@epam.com \
--cc=Oleksandr_Andrushchenko@epam.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=paul@xen.org \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.