From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"jbeulich@suse.com" <jbeulich@suse.com>,
"julien@xen.org" <julien@xen.org>,
"sstabellini@kernel.org" <sstabellini@kernel.org>,
Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
Artem Mygaiev <Artem_Mygaiev@epam.com>,
Bertrand Marquis <bertrand.marquis@arm.com>,
Rahul Singh <rahul.singh@arm.com>
Subject: Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
Date: Fri, 11 Feb 2022 12:40:00 +0100 [thread overview]
Message-ID: <YgZLEMW9US9QjjYG@Air-de-Roger> (raw)
In-Reply-To: <c4666570-666e-6680-5ec2-adf1da51ad06@epam.com>
On Fri, Feb 11, 2022 at 07:27:39AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>
> On 10.02.22 18:16, Roger Pau Monné wrote:
> > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> > pci_remove_device, and likely when vPCI gets also used in
> > {de}assign_device I think.
> Yes, this is indeed an issue, but I was not trying to solve it in
> context of vPCI locking yet. I think we should discuss how do
> we approach pdev locking, so I can create a patch for that.
> that being said, I would like not to solve pdev in this patch yet
>
> ...I do understand we do want to avoid that, but at the moment
> a single reliable way for making sure pdev is alive seems to
> be pcidevs_lock....
I think we will need to make pcidevs_lock a rwlock and take it in read
mode for pci_get_pdev_by_domain.
We didn't have this scenario before where PCI emulation is done in the
hypervisor, and hence the locking around those data structures has not
been designed for those use-cases.
> >
> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> being compared, which may produce a deadlock. It is not possible to
> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> the deadlock, check which registers are going to be written and acquire
> >> the lock in the appropriate mode from the beginning.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does not
> >> access multiple pdevs at the same time, can still use a combination of the
> >> read lock and pdev->vpci->lock.
> >>
> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> header's type.
> >>
> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >>
> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> initialize many more fields of the struct vpci before assigning it to
> >> pdev->vpci.
> >>
> >> 6. vpci_{add|remove}_register are required to be called with the write lock
> >> held, but it is not feasible to add an assert there as it requires
> >> struct domain to be passed for that. So, add a comment about this requirement
> >> to these and other functions with the equivalent constraints.
> >>
> >> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
> >>
> >> 8. This is based on the discussion at [1].
> >>
> >> [1] https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$ [lore[.]kernel[.]org]
> >>
> >> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Suggested-by: Jan Beulich <jbeulich@suse.com>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>
> >> ---
> >> This was checked on x86: with and without PVH Dom0.
> >> ---
> >> xen/arch/x86/hvm/vmsi.c | 2 +
> >> xen/common/domain.c | 3 +
> >> xen/drivers/vpci/header.c | 8 +++
> >> xen/drivers/vpci/msi.c | 8 ++-
> >> xen/drivers/vpci/msix.c | 40 +++++++++++--
> >> xen/drivers/vpci/vpci.c | 114 ++++++++++++++++++++++++++++----------
> >> xen/include/xen/sched.h | 3 +
> >> xen/include/xen/vpci.h | 2 +
> >> 8 files changed, 146 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> >> index 13e2a190b439..351cb968a423 100644
> >> --- a/xen/arch/x86/hvm/vmsi.c
> >> +++ b/xen/arch/x86/hvm/vmsi.c
> >> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
> >> {
> >> unsigned int i;
> >>
> >> + ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock));
> > ^ no need for the double negation.
> Ok, will update all asserts which use !!
> >
> > Also this asserts that the lock is taken, but could be by a different
> > pCPU. I guess it's better than nothing.
> Fair enough. Do you still want the asserts or should I remove them?
Likely fine to leave them.
> >
> >> +
> >> for ( i = 0; i < msix->max_entries; i++ )
> >> {
> >> const struct vpci_msix_entry *entry = &msix->entries[i];
> > Since this function is now called with the per-domain rwlock read
> > locked it's likely not appropriate to call process_pending_softirqs
> > while holding such lock (check below).
> You are right, as it is possible that:
>
> process_pending_softirqs -> vpci_process_pending -> read_lock
>
> Even more, vpci_process_pending may also
>
> read_unlock -> vpci_remove_device -> write_lock
>
> in its error path. So, any invocation of process_pending_softirqs
> must not hold d->vpci_rwlock at least.
>
> And also we need to check that pdev->vpci was not removed
> in between or *re-created*
> >
> > We will likely need to re-iterate over the list of pdevs assigned to
> > the domain and assert that the pdev is still assigned to the same
> > domain.
> So, do you mean a pattern like the below should be used at all
> places where we need to call process_pending_softirqs?
>
> read_unlock
> process_pending_softirqs
> read_lock
> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
> <continue processing>
Something along those lines. You likely need to continue iterate using
for_each_pdev.
> >> +{
> >> + /*
> >> + * Writing the command register and ROM BAR register may trigger
> >> + * modify_bars to run which in turn may access multiple pdevs while
> >> + * checking for the existing BAR's overlap. The overlapping check, if done
> >> + * under the read lock, requires vpci->lock to be acquired on both devices
> >> + * being compared, which may produce a deadlock. It is not possible to
> >> + * upgrade read lock to write lock in such a case. So, in order to prevent
> >> + * the deadlock, check which registers are going to be written and acquire
> >> + * the lock in the appropriate mode from the beginning.
> >> + */
> >> + if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
> >> + return true;
> >> +
> >> + if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
> > No need for the comparison if rom_reg is unset. Also you can OR both
> > conditions into a single if.
> If we open code vpci_offset_cmp with a single if all this is going
> to be a bit clumsy:
>
> if ( r1_offset < r2_offset + r2_size &&
> r2_offset < r1_offset + r1_size )
> return 0;
> This is a single check.
> Now we need to check two registers with the code above and
> also check that pdev->vpci->header.rom_reg != 0
>
> I think it would be more readable if we have a tiny helper function
>
> static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size,
> unsigned int r2_offset, unsigned int r2_size)
> {
> /* Return 0 if registers overlap. */
> if ( r1_offset < r2_offset + r2_size &&
> r2_offset < r1_offset + r1_size )
> return false;
> return true;
> }
>
> So, then we can have something like
>
> static bool vpci_header_write_lock(const struct pci_dev *pdev,
> unsigned int start, unsigned int size)
> {
> if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ||
> (pdev->vpci->header.rom_reg &&
> !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) )
> return true;
>
> return false;
> }
Just create an 'overlaps' static function in header.c.
Thanks, Roger.
>
next prev parent reply other threads:[~2022-02-11 11:40 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-09 13:36 [PATCH] vpci: introduce per-domain lock to protect vpci structure Oleksandr Andrushchenko
2022-02-10 16:16 ` Roger Pau Monné
2022-02-11 7:27 ` Oleksandr Andrushchenko
2022-02-11 11:40 ` Roger Pau Monné [this message]
2022-02-11 12:13 ` Oleksandr Andrushchenko
2022-02-11 15:44 ` Roger Pau Monné
2022-02-14 6:33 ` Oleksandr Andrushchenko
2022-02-14 8:47 ` Roger Pau Monné
2022-02-14 8:53 ` Oleksandr Andrushchenko
2022-02-14 9:36 ` Oleksandr Andrushchenko
2022-02-14 10:34 ` Roger Pau Monné
2022-02-14 10:53 ` Oleksandr Andrushchenko
2022-02-14 11:11 ` Roger Pau Monné
2022-02-14 11:15 ` Oleksandr Andrushchenko
2022-02-14 11:25 ` Roger Pau Monné
2022-02-14 11:37 ` Oleksandr Andrushchenko
2022-02-14 12:57 ` Jan Beulich
2022-02-14 13:13 ` Oleksandr Andrushchenko
2022-02-14 13:22 ` Jan Beulich
2022-02-14 13:27 ` Oleksandr Andrushchenko
2022-02-14 13:48 ` Jan Beulich
2022-02-14 14:00 ` Oleksandr Andrushchenko
2022-02-14 14:09 ` Jan Beulich
2022-02-15 8:30 ` Roger Pau Monné
2022-02-15 8:39 ` Oleksandr Andrushchenko
2022-02-11 8:46 ` Oleksandr Andrushchenko
2022-02-11 11:51 ` Roger Pau Monné
2022-02-11 12:14 ` Oleksandr Andrushchenko
2022-02-14 14:19 ` Jan Beulich
2022-02-14 14:26 ` Oleksandr Andrushchenko
2022-02-14 14:31 ` Jan Beulich
2022-02-14 14:34 ` Oleksandr Andrushchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgZLEMW9US9QjjYG@Air-de-Roger \
--to=roger.pau@citrix.com \
--cc=Artem_Mygaiev@epam.com \
--cc=Oleksandr_Andrushchenko@epam.com \
--cc=Oleksandr_Tyshchenko@epam.com \
--cc=Volodymyr_Babchuk@epam.com \
--cc=bertrand.marquis@arm.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=rahul.singh@arm.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.