Re: [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Stewart Hildebrand <stewart.hildebrand@amd.com>,
	Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>,
	Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Paul Durrant <paul@xen.org>
Subject: Re: [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure
Date: Tue, 28 Nov 2023 22:24:12 +0000	[thread overview]
Message-ID: <87h6l5tt2s.fsf@epam.com> (raw)
In-Reply-To: <ZVeDzhzl4y6Ok9y7@macbook.local>

Hi Roger

Thank you for the review.

Roger Pau Monné <roger.pau@citrix.com> writes:

> On Thu, Oct 12, 2023 at 10:09:15PM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> 
>> Use a previously introduced per-domain read/write lock to check
>> whether vpci is present, so we are sure there are no accesses to the
>> contents of the vpci struct if not. This lock can be used (and in a
>> few cases is used right away) so that vpci removal can be performed
>> while holding the lock in write mode. Previously such removal could
>> race with vpci_read for example.
>> 
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>> 
>> 1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
>> from being removed.
>> 
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock instead. To prevent
>> deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
>> always lock hwdom first.
>> 
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>> 
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>> 
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>>  - in apply_map because of the context it is called (no race condition
>>    possible)
>>  - for MSI/MSI-X debug code because it is called at the end of
>>    pdev->vpci access and no further access to pdev->vpci is made
>> 
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
>> while accessing pdevs in vpci code.
>> 
>> 6. We are removing multiple ASSERT(pcidevs_locked()) instances because
>> they are too strict now: they should be corrected to
>> ASSERT(pcidevs_locked() || rw_is_locked(&d->pci_lock)), but problem is
>> that mentioned instances does not have access to the domain
>> pointer and it is not feasible to pass a domain pointer to a function
>> just for debugging purposes.
>> 
>> There is a possible lock inversion in MSI code, as some parts of it
>> acquire pcidevs_lock() while already holding d->pci_lock.
>
> Is this going to be addressed in a further patch?
>

It is actually addressed in this patch, in the v10. I just forgot to
remove this sentence from the patch description. My bad.

This was fixed by additional parameter to allocate_and_map_msi_pirq(),
as it is being called from two places: from vpci_msi_enable(), while we
already are holding d->pci_lock, or from physdev_map_pirq(), when there
are no locks are taken.

[...]

>> @@ -2908,7 +2908,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
>>  
>>      msi->irq = irq;
>>  
>> -    pcidevs_lock();
>> +    if ( use_pci_lock )
>> +        pcidevs_lock();
>
> Instead of passing the flag it might be better if the caller can take
> the lock, as to avoid having to pass an extra parameter.
>
> Then we should also assert that either the pcidev_lock or the
> per-domain pci lock is taken?
>

This is a good idea. I'll add lock in physdev_map_pirq(). This is only
one place where we call physdev_map_pirq() without holding any lock.

>>      /* Verify or get pirq. */
>>      write_lock(&d->event_lock);
>>      pirq = allocate_pirq(d, index, *pirq_p, irq, type, &msi->entry_nr);
>> @@ -2924,7 +2925,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
>>  
>>   done:
>>      write_unlock(&d->event_lock);
>> -    pcidevs_unlock();
>> +    if ( use_pci_lock )
>> +        pcidevs_unlock();
>>      if ( ret )
>>      {
>>          switch ( type )
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 20275260b3..466725d8ca 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -602,7 +602,7 @@ static int msi_capability_init(struct pci_dev *dev,
>>      unsigned int i, mpos;
>>      uint16_t control;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>>      pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
>>      if ( !pos )
>>          return -ENODEV;
>> @@ -771,7 +771,7 @@ static int msix_capability_init(struct pci_dev *dev,
>>      if ( !pos )
>>          return -ENODEV;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>>  
>>      control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
>>      /*
>> @@ -988,8 +988,6 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
>>  {
>>      struct msi_desc *old_desc;
>>  
>> -    ASSERT(pcidevs_locked());
>> -
>>      if ( !pdev )
>>          return -ENODEV;
>>  
>> @@ -1043,8 +1041,6 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc,
>>  {
>>      struct msi_desc *old_desc;
>>  
>> -    ASSERT(pcidevs_locked());
>> -
>>      if ( !pdev || !pdev->msix )
>>          return -ENODEV;
>>  
>> @@ -1154,8 +1150,6 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off)
>>  int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
>>  		   struct pci_dev *pdev)
>>  {
>> -    ASSERT(pcidevs_locked());
>> -
>
> If you have the pdev in all the above function, you could expand the
> assert to test for pdev->domain->pci_lock?
>

Yes, you are right. This is the leftover from times where
pci_enable_msi() acquired pdev pointer by itself.

[...]

>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 767c1ba718..a52e52db96 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -172,6 +172,7 @@ bool vpci_process_pending(struct vcpu *v)
>>          if ( rc == -ERESTART )
>>              return true;
>>  
>> +        write_lock(&v->domain->pci_lock);
>>          spin_lock(&v->vpci.pdev->vpci->lock);
>>          /* Disable memory decoding unconditionally on failure. */
>>          modify_decoding(v->vpci.pdev,
>> @@ -190,6 +191,7 @@ bool vpci_process_pending(struct vcpu *v)
>>               * failure.
>>               */
>>              vpci_remove_device(v->vpci.pdev);
>> +        write_unlock(&v->domain->pci_lock);
>>      }
>>  
>>      return false;
>> @@ -201,8 +203,20 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>>      struct map_data data = { .d = d, .map = true };
>>      int rc;
>>  
>> +    ASSERT(rw_is_write_locked(&d->pci_lock));
>> +
>>      while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
>> +    {
>> +        /*
>> +         * It's safe to drop and reacquire the lock in this context
>> +         * without risking pdev disappearing because devices cannot be
>> +         * removed until the initial domain has been started.
>> +         */
>> +        read_unlock(&d->pci_lock);
>>          process_pending_softirqs();
>> +        read_lock(&d->pci_lock);
>
> You are asserting the lock is taken in write mode just above the usage
> of read_{un,}lock().  Either the assert is wrong, or the usage of
> read_{un,}lock() is wrong.

Oops, looks like this is the rebasing artifact. The final version of the
code uses write locks of course. Patch "vpci/header: handle p2m range
sets per BAR" changes this piece of code too and replaces read lock
with the write lock. I'll move that change to this patch.

[...]

>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 3bec9a4153..112de56fb3 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -38,6 +38,8 @@ extern vpci_register_init_t *const __end_vpci_array[];
>>  
>>  void vpci_remove_device(struct pci_dev *pdev)
>>  {
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>>      if ( !has_vpci(pdev->domain) || !pdev->vpci )
>>          return;
>>  
>> @@ -73,6 +75,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>      const unsigned long *ro_map;
>>      int rc = 0;
>>  
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>>      if ( !has_vpci(pdev->domain) )
>>          return 0;
>>  
>> @@ -326,11 +330,12 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
>>  
>>  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  {
>> -    const struct domain *d = current->domain;
>> +    struct domain *d = current->domain;
>>      const struct pci_dev *pdev;
>>      const struct vpci_register *r;
>>      unsigned int data_offset = 0;
>>      uint32_t data = ~(uint32_t)0;
>> +    rwlock_t *lock;
>>  
>>      if ( !size )
>>      {
>> @@ -342,11 +347,21 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>       * Find the PCI dev matching the address, which for hwdom also requires
>>       * consulting DomXEN.  Passthrough everything that's not trapped.
>>       */
>> +    lock = &d->pci_lock;
>> +    read_lock(lock);
>>      pdev = pci_get_pdev(d, sbdf);
>>      if ( !pdev && is_hardware_domain(d) )
>> +    {
>> +        read_unlock(lock);
>> +        lock = &dom_xen->pci_lock;
>> +        read_lock(lock);
>>          pdev = pci_get_pdev(dom_xen, sbdf);
>
> I'm unsure whether devices assigned to dom_xen can change ownership
> after boot, so maybe there's no need for all this lock dance, as the
> device cannot disappear?
>
> Maybe just taking the hardware domain lock is enough to prevent
> concurrent accesses in that case, as the hardware domain is the only
> allowed to access devices owned by dom_xen.

This sounds correct. If there are no objections then I can remove extra
locks.

-- 
WBR, Volodymyr

next prev parent reply	other threads:[~2023-11-28 22:24 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12 22:09 [PATCH v10 00/17] PCI devices passthrough on Arm, part 3 Volodymyr Babchuk
2023-10-12 22:09 ` [PATCH v10 01/17] pci: msi: pass pdev to pci_enable_msi() function Volodymyr Babchuk
2023-10-30 15:55   ` Jan Beulich
2023-11-17 13:59   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure Volodymyr Babchuk
2023-11-03 15:39   ` Stewart Hildebrand
2023-11-17 15:16   ` Roger Pau Monné
2023-11-28 22:24     ` Volodymyr Babchuk [this message]
2023-10-12 22:09 ` [PATCH v10 05/17] vpci: add hooks for PCI device assign/de-assign Volodymyr Babchuk
2023-11-20 15:04   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 02/17] pci: introduce per-domain PCI rwlock Volodymyr Babchuk
2023-11-17 14:33   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 04/17] vpci: restrict unhandled read/write operations for guests Volodymyr Babchuk
2023-10-12 22:09 ` [PATCH v10 06/17] vpci/header: rework exit path in init_bars Volodymyr Babchuk
2023-11-20 15:07   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 08/17] rangeset: add RANGESETF_no_print flag Volodymyr Babchuk
2023-10-12 22:09 ` [PATCH v10 07/17] vpci/header: implement guest BAR register handlers Volodymyr Babchuk
2023-10-14 16:00   ` Stewart Hildebrand
2023-11-20 16:06   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 11/17] vpci/header: program p2m with guest BAR view Volodymyr Babchuk
2023-11-21 12:24   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 09/17] rangeset: add rangeset_empty() function Volodymyr Babchuk
2023-10-13 17:54   ` Stewart Hildebrand
2023-10-13 18:08     ` Volodymyr Babchuk
2023-10-12 22:09 ` [PATCH v10 10/17] vpci/header: handle p2m range sets per BAR Volodymyr Babchuk
2023-11-20 17:29   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 12/17] vpci/header: emulate PCI_COMMAND register for guests Volodymyr Babchuk
2023-10-13 21:53   ` Volodymyr Babchuk
2023-11-21 14:17   ` Roger Pau Monné
2023-12-01  2:05     ` Volodymyr Babchuk
2023-12-01  9:04       ` Roger Pau Monné
2023-12-21 22:58     ` Stewart Hildebrand
2023-10-12 22:09 ` [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology Volodymyr Babchuk
2023-11-16 16:06   ` Julien Grall
2023-11-16 23:28     ` Stefano Stabellini
2023-11-17  0:06       ` Julien Grall
2023-11-17  0:51         ` Stefano Stabellini
2023-11-17  0:21       ` Volodymyr Babchuk
2023-11-17  0:58         ` Stefano Stabellini
2023-11-17 14:09           ` Volodymyr Babchuk
2023-11-17 18:30             ` Julien Grall
2023-11-17 20:08               ` Volodymyr Babchuk
2023-11-17 21:43                 ` Stefano Stabellini
2023-11-17 22:22                   ` Volodymyr Babchuk
2023-11-18  0:45                     ` Stefano Stabellini
2023-11-21  0:42                       ` Volodymyr Babchuk
2023-11-22  1:12                         ` Stefano Stabellini
2023-11-22 11:53                           ` Roger Pau Monné
2023-11-22 21:18                             ` Stefano Stabellini
2023-11-23  8:29                               ` Roger Pau Monné
2023-11-28 23:45                                 ` Volodymyr Babchuk
2023-11-29  8:33                                   ` Roger Pau Monné
2023-11-30  2:28                                     ` Stefano Stabellini
2023-11-21 14:40   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 14/17] xen/arm: translate virtual PCI bus topology for guests Volodymyr Babchuk
2023-11-21 15:11   ` Roger Pau Monné
2023-10-12 22:09 ` [PATCH v10 15/17] xen/arm: account IO handlers for emulated PCI MSI-X Volodymyr Babchuk
2023-10-13  8:34   ` Julien Grall
2023-10-13 13:06     ` Volodymyr Babchuk
2023-10-13 16:46       ` Julien Grall
2023-10-13 17:17         ` Volodymyr Babchuk
2023-10-12 22:09 ` [PATCH v10 16/17] xen/arm: vpci: permit access to guest vpci space Volodymyr Babchuk
2023-10-16 11:00   ` Jan Beulich
2023-10-24 19:44     ` Stewart Hildebrand
2023-10-12 22:09 ` [PATCH v10 17/17] arm/vpci: honor access size when returning an error Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6l5tt2s.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Oleksandr_Andrushchenko@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=paul@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=stewart.hildebrand@amd.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.