All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Stewart Hildebrand <stewart.hildebrand@amd.com>
Cc: xen-devel@lists.xenproject.org,
	Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>,
	Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
	George Dunlap <george.dunlap@citrix.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Paul Durrant <paul@xen.org>,
	Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Subject: Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure
Date: Fri, 16 Feb 2024 12:44:56 +0100	[thread overview]
Message-ID: <Zc9KuCeoOciUdqTN@macbook> (raw)
In-Reply-To: <20240215203001.1816811-1-stewart.hildebrand@amd.com>

On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Use the per-domain PCI read/write lock to protect the presence of the
> pci device vpci field. This lock can be used (and in a few cases is used
> right away) so that vpci removal can be performed while holding the lock
> in write mode. Previously such removal could race with vpci_read for
> example.
> 
> When taking both d->pci_lock and pdev->vpci->lock, they should be
> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
> possible deadlock situations.
> 
> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if
> done under the read lock, requires vpci->lock to be acquired on both
> devices being compared, which may produce a deadlock. It is not
> possible to upgrade read lock to write lock in such a case. So, in
> order to prevent the deadlock, use d->pci_lock in write mode instead.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does
> not access multiple pdevs at the same time, can still use a
> combination of the read lock and pdev->vpci->lock.
> 
> 3. Drop const qualifier where the new rwlock is used and this is
> appropriate.
> 
> 4. Do not call process_pending_softirqs with any locks held. For that
> unlock prior the call and re-acquire the locks after. After
> re-acquiring the lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>    possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>    pdev->vpci access and no further access to pdev->vpci is made
> 
> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
> while accessing pdevs in vpci code.
> 
> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
> do not go away. The vPCI functions call several MSI-related functions
> which already have existing non-vPCI callers. Change those MSI-related
> functions to allow using either pcidevs_lock() or d->pci_lock for
> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
> sufficient. Note that this pdev protection mechanism does not protect
> other state or critical sections. These MSI-related functions already
> have other race condition and state protection mechanims (e.g.
> d->event_lock and msixtbl RCU), so we deduce that the use of the global
> pcidevs_lock() is to ensure that pdevs do not go away.
> 
> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
> that pdevs do not go away. The purpose of this wrapper is to aid
> readability and document the intent of the pdev protection mechanism.
> 
> 8. When possible, the existing non-vPCI callers of these MSI-related
> functions haven't been switched to use the newly introduced per-domain
> pci_lock, and will continue to use the global pcidevs_lock(). This is
> done to reduce the risk of the new locking scheme introducing
> regressions. Those users will be adjusted in due time. One exception
> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
> the caller, physdev_map_pirq(): this instance is switched to
> read_lock(&d->pci_lock) right away.
> 
> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

A couple of questions and the pdev_list_is_read_locked() needs a small
adjustment.

> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>      unsigned int i;
>  
> +    /*
> +     * Assert that d->pdev_list doesn't change. pdev_list_is_read_locked() is
> +     * not suitable here because we may read_unlock(&pdev->domain->pci_lock)
> +     * before returning.

I'm confused by this comment, as I don't see why it matters that the
lock might be lock before returning.  We need to ensure the lock is
taken at the time of the assert, and hence pdev_list_is_read_locked()
can be used.

> +     */
> +    ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock));
> +    ASSERT(spin_is_locked(&msix->pdev->vpci->lock));
> +
>      for ( i = 0; i < msix->max_entries; i++ )
>      {
>          const struct vpci_msix_entry *entry = &msix->entries[i];
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index aabc5465a7d3..9f31cb84c9f3 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -171,6 +171,20 @@ void pcidevs_lock(void);
>  void pcidevs_unlock(void);
>  bool __must_check pcidevs_locked(void);
>  
> +#ifndef NDEBUG
> +/*
> + * Check for use in ASSERTs to ensure there will be no changes to the entries
> + * in d->pdev_list (but not the contents of each entry).
> + * This check is not suitable for protecting other state or critical regions.
> + */
> +#define pdev_list_is_read_locked(d) ({                             \
> +        /* NB: d may be evaluated multiple times, or not at all */ \
> +        pcidevs_locked() || (d && rw_is_locked(&d->pci_lock));     \

'd' is missing parentheses here, should be (d).

> +    })
> +#else
> +bool pdev_list_is_read_locked(const struct domain *d);
> +#endif

FWIW, if this is only intended to be used with ASSERT, it might as
well be an ASSERT itself:

ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ...

Don't have a strong opinion, so I'm fine with how it's used, just
noting it might be clearer if it was an ASSERT_ right away.

Thanks, Roger.


  reply	other threads:[~2024-02-16 11:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02 21:33 [PATCH v13 00/14] PCI devices passthrough on Arm, part 3 Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 01/14] vpci: use per-domain PCI lock to protect vpci structure Stewart Hildebrand
2024-02-06 15:43   ` Stewart Hildebrand
2024-02-13  8:35   ` Roger Pau Monné
2024-02-13  8:44     ` Jan Beulich
2024-02-13  9:01       ` Roger Pau Monné
2024-02-13  9:05         ` Jan Beulich
2024-02-13 16:58           ` Stewart Hildebrand
2024-02-14  9:07             ` Jan Beulich
2024-02-13 16:57     ` Stewart Hildebrand
2024-02-14 11:38   ` Jan Beulich
2024-02-15  5:26     ` Stewart Hildebrand
2024-02-15 20:30   ` [PATCH v13.1 " Stewart Hildebrand
2024-02-16 11:44     ` Roger Pau Monné [this message]
2024-02-16 14:41       ` Stewart Hildebrand
2024-02-19  8:46         ` Roger Pau Monné
2024-02-19 11:47   ` [PATCH v13.2 " Stewart Hildebrand
2024-02-19 12:10     ` Jan Beulich
2024-02-19 12:47       ` Stewart Hildebrand
2024-02-19 13:12         ` Jan Beulich
2024-02-19 14:14           ` Stewart Hildebrand
2024-02-21  2:45   ` [PATCH v13.3 " Stewart Hildebrand
2024-02-26 14:47     ` Jan Beulich
2024-02-27 11:08       ` Roger Pau Monné
2024-02-27 11:06     ` Roger Pau Monné
2024-02-02 21:33 ` [PATCH v13 02/14] vpci: restrict unhandled read/write operations for guests Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 03/14] vpci: add hooks for PCI device assign/de-assign Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 04/14] vpci/header: rework exit path in init_header() Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 05/14] vpci/header: implement guest BAR register handlers Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 06/14] rangeset: add RANGESETF_no_print flag Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 07/14] rangeset: add rangeset_purge() function Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 08/14] vpci/header: handle p2m range sets per BAR Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 09/14] vpci/header: program p2m with guest BAR view Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests Stewart Hildebrand
2024-02-14 15:41   ` Jan Beulich
2024-03-18 21:03     ` Stewart Hildebrand
2024-03-19  8:21       ` Jan Beulich
2024-02-02 21:33 ` [PATCH v13 11/14] vpci: add initial support for virtual PCI bus topology Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 12/14] xen/arm: translate virtual PCI bus topology for guests Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 13/14] xen/arm: account IO handlers for emulated PCI MSI-X Stewart Hildebrand
2024-02-02 21:33 ` [PATCH v13 14/14] arm/vpci: honor access size when returning an error Stewart Hildebrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zc9KuCeoOciUdqTN@macbook \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=oleksandr_andrushchenko@epam.com \
    --cc=paul@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=stewart.hildebrand@amd.com \
    --cc=volodymyr_babchuk@epam.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.