All of lore.kernel.org
 help / color / mirror / Atom feed
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Oleksandr Andrushchenko" <Oleksandr_Andrushchenko@epam.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Wei Liu" <wl@xen.org>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Julien Grall" <julien@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Paul Durrant" <paul@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure
Date: Sun, 9 Jul 2023 22:41:35 +0000	[thread overview]
Message-ID: <87h6qchfn4.fsf@epam.com> (raw)
In-Reply-To: <ab79bcb6-6fc8-c68f-65bf-16ce7316c3ae@suse.com>


Hi Jan,

Jan Beulich <jbeulich@suse.com> writes:

> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>> I am currently implementing your proposal (along with Jan's
>> suggestions), but I am facing ABBA deadlock with IOMMU's
>> reassign_device() call, which has this piece of code:
>> 
>>         list_move(&pdev->domain_list, &target->pdev_list);
>> 
>> My immediate change was:
>> 
>>         write_lock(&pdev->domain->pci_lock);
>>         list_del(&pdev->domain_list);
>>         write_unlock(&pdev->domain->pci_lock);
>> 
>>         write_lock(&target->pci_lock);
>>         list_add(&pdev->domain_list, &target->pdev_list);
>>         write_unlock(&target->pci_lock);
>> 
>> But this will not work because reassign_device is called from
>> pci_release_devices() which iterates over d->pdev_list, so we need to
>> take a d->pci_lock early.
>> 
>> Any suggestions on how to fix this? My idea is to remove a device from a
>> list one at time:
>> 
>> int pci_release_devices(struct domain *d)
>> {
>>     struct pci_dev *pdev;
>>     u8 bus, devfn;
>>     int ret;
>> 
>>     pcidevs_lock();
>>     write_lock(&d->pci_lock);
>>     ret = arch_pci_clean_pirqs(d);
>>     if ( ret )
>>     {
>>         pcidevs_unlock();
>>         write_unlock(&d->pci_lock);
>>         return ret;
>>     }
>> 
>>     while ( !list_empty(&d->pdev_list) )
>>     {
>>         pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>         bus = pdev->bus;
>>         devfn = pdev->devfn;
>>         list_del(&pdev->domain_list);
>>         write_unlock(&d->pci_lock);
>>         ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>         write_lock(&d->pci_lock);
>
> I think it needs doing almost like this, but with two more tweaks and
> no list_del() right here (first and foremost to avoid needing to
> figure whether removing early isn't going to subtly break anything;
> see below for an error case that would end up with changed behavior):
>
>     while ( !list_empty(&d->pdev_list) )
>     {
>         const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct pci_dev, domain_list);
>         uint16_t seg = pdev->seg;
>         uint8_t bus = pdev->bus;
>         uint8_t devfn = pdev->devfn;
>
>         write_unlock(&d->pci_lock);
>         ret = deassign_device(d, seg, bus, devfn) ?: ret;
>         write_lock(&d->pci_lock);
>     }
>
> One caveat though: The original list_for_each_entry_safe() guarantees
> the loop to complete; your use of list_del() would guarantee that too,
> but then the device wouldn't be on the list anymore if deassign_device()
> failed. Therefore I guess you will need another local list where you
> (temporarily) put all the devices which deassign_device() left on the
> list, and which you would then move back to d->pdev_list after the loop
> has finished.

Okay, taking into the account all that you wrote, I came with this
implementation:

int pci_release_devices(struct domain *d)
{
    int combined_ret;
    LIST_HEAD(failed_pdevs);

    pcidevs_lock();
    write_lock(&d->pci_lock);
    combined_ret = arch_pci_clean_pirqs(d);
    if ( combined_ret )
    {
        pcidevs_unlock();
        write_unlock(&d->pci_lock);
        return combined_ret;
    }

    while ( !list_empty(&d->pdev_list) )
    {
        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
                                                struct pci_dev,
                                                domain_list);
        uint16_t seg = pdev->seg;
        uint8_t bus = pdev->bus;
        uint8_t devfn = pdev->devfn;
        int ret;

        write_unlock(&d->pci_lock);
        ret = deassign_device(d, seg, bus, devfn);
        write_lock(&d->pci_lock);
        if ( ret )
        {
            bool still_present = false;
            const struct pci_dev *tmp;

            for_each_pdev ( d, tmp )
            {
                if ( tmp == (const struct pci_dev*)pdev )
                {
                    still_present = true;
                    break;
                }
            }
            if ( still_present )
                list_move(&pdev->domain_list, &failed_pdevs);
            combined_ret = ret;
        }
    }

    list_splice(&failed_pdevs, &d->pdev_list);
    write_unlock(&d->pci_lock);
    pcidevs_unlock();

    return combined_ret;
}


> (Whether it is sufficient to inspect the list head to
> determine whether the pdev is still on the list needs careful checking.)

I am not sure that this is enough. We dropped d->pci_lock so we can
expect that the list was permutated in a random way.


-- 
WBR, Volodymyr

  parent reply	other threads:[~2023-07-09 22:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13 10:32 [PATCH v7 00/12] PCI devices passthrough on Arm, part 3 Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 01/12] vpci: introduce per-domain lock to protect vpci structure Volodymyr Babchuk
2023-06-16 16:30   ` Roger Pau Monné
2023-06-21 22:07     ` Volodymyr Babchuk
2023-06-22  8:15       ` Roger Pau Monné
2023-06-22 21:17         ` Volodymyr Babchuk
2023-06-23  8:50           ` Roger Pau Monné
2023-06-23  9:26             ` Volodymyr Babchuk
2023-06-23 17:09               ` Jan Beulich
2023-07-04 21:03             ` Volodymyr Babchuk
2023-07-05  7:11               ` Jan Beulich
2023-07-05  8:59                 ` Roger Pau Monné
2023-07-05  9:13                   ` Jan Beulich
2023-07-07  2:02                     ` Volodymyr Babchuk
2023-07-07  6:32                       ` Jan Beulich
2023-07-09 22:41                 ` Volodymyr Babchuk [this message]
2023-07-10  6:20                   ` Jan Beulich
2023-06-20 20:17   ` Stewart Hildebrand
2023-06-22 21:18     ` Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 04/12] vpci/header: implement guest BAR register handlers Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 02/12] vpci: restrict unhandled read/write operations for guests Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 03/12] vpci: add hooks for PCI device assign/de-assign Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 06/12] vpci/header: handle p2m range sets per BAR Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 05/12] rangeset: add RANGESETF_no_print flag Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 08/12] vpci/header: emulate PCI_COMMAND register for guests Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 07/12] vpci/header: program p2m with guest BAR view Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 10/12] vpci: add initial support for virtual PCI bus topology Volodymyr Babchuk
2023-06-21 12:06   ` Jan Beulich
2023-07-07 16:45     ` Oleksandr Tyshchenko
2023-06-13 10:32 ` [PATCH v7 11/12] xen/arm: translate virtual PCI bus topology for guests Volodymyr Babchuk
2023-06-21 12:01   ` Jan Beulich
2023-06-13 10:32 ` [PATCH v7 09/12] vpci/header: reset the command register when adding devices Volodymyr Babchuk
2023-06-13 10:32 ` [PATCH v7 12/12] xen/arm: account IO handlers for emulated PCI MSI-X Volodymyr Babchuk
2023-06-15  2:01 ` [PATCH v7 00/12] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
2023-06-15  9:39   ` Volodymyr Babchuk
2023-06-15 12:16     ` Jan Beulich
2023-06-21 22:11       ` Volodymyr Babchuk
2023-06-22  7:48         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6qchfn4.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Oleksandr_Andrushchenko@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=paul@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.