From: Leon Romanovsky <leonro@nvidia.com>
To: Jim Harris <jim.harris@samsung.com>
Cc: "Bjorn Helgaas" <helgaas@kernel.org>,
"Kuppuswamy Sathyanarayanan"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Pierre Crégut" <pierre.cregut@orange.com>
Subject: Re: [PATCH v2 1/2] PCI/IOV: Revert "PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes"
Date: Tue, 13 Feb 2024 09:37:10 +0200 [thread overview]
Message-ID: <20240213073710.GB52640@unreal> (raw)
In-Reply-To: <ZcqitnWTh+zQ+H4p@ubuntu>
On Mon, Feb 12, 2024 at 10:59:03PM +0000, Jim Harris wrote:
> On Mon, Feb 12, 2024 at 02:27:14PM -0600, Bjorn Helgaas wrote:
> > On Sun, Feb 11, 2024 at 10:48:44AM +0200, Leon Romanovsky wrote:
> > > On Fri, Feb 09, 2024 at 07:20:28PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > > > On 2/9/24 3:52 PM, Jim Harris wrote:
> > > > > If an SR-IOV enabled device is held by vfio, and the device is removed,
> > > > > vfio will hold device lock and notify userspace of the removal. If
> > > > > userspace reads the sriov_numvfs sysfs entry, that thread will be blocked
> > > > > since sriov_numvfs_show() also tries to acquire the device lock. If that
> > > > > same thread is responsible for releasing the device to vfio, it results in
> > > > > a deadlock.
> > > > >
> > > > > The proper way to detect a change to the num_VFs value is to listen for a
> > > > > sysfs event, not to add a device_lock() on the attribute _show() in the
> > > > > kernel.
> >
> > The lock was not about detecting a change; Pierre did this:
> >
> > ip monitor dev ${DEVICE} | grep --line-buffered "^${id}:" | while read line; do \
> > cat ${path}/device/sriov_numvfs; \
> >
> > which I assume works by listening for sysfs events. The problem was
> > that after the event occurred, the sriov_numvfs read got a stale value
> > (see https://bugzilla.kernel.org/show_bug.cgi?id=202991).
>
> I don't think 'ip monitor dev' listens for any sysfs events. Or at least if
> I have this running and write values to sriov_numvfs, I don't see any
> output.
>
> It looks like the original bug report was against v5.0 (matching by dates
> and the patch file attached). In that code, we have:
>
> kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
> iov->num_VFs = nr_virtfn;
>
> which is identical to how the code looks today. Is it possible that
> userspace could react to this uevent and read the stale num_VFs before
> iov->num_VFs gets written here? I mean, theoretically it's possible, but
> from the bug report it seems like the scenario Pierre was facing was
> 100% reproducible.
>
> It would be great if we could get input from Pierre on this. It isn't clear
> to me from the bug report what exactly is updating the sriov_numvfs sysfs
> entry, and what is triggering that update.
>
> We could also revisit my original suggestion, which was to use a
> discrete lock just for this sysfs entry, rather than overloading the
> device lock. That probably has lower risk of introducing an unintended
> regression.
The idea that lock issues are need to be solved by adding more locks
doesn't sound good to me.
Thanks
>
> https://lore.kernel.org/linux-pci/ZXNNQkXzluoyeguu@bgt-140510-bm01.eng.stellus.in/
>
> >
> > So I would drop this sentence because I don't think it accurately
> > reflects the reason for 35ff867b7657.
> >
> > > > Since you are reverting a commit that synchronizes SysFS read
> > > > /write, please add some comments about why it is not an
> > > > issue anymore.
> > >
> > > It was never an issue, the idea that sysfs read and write should be
> > > serialized by kernel is not correct by definition.
> >
> > I think it *was* an issue. The behavior Pierre observed at was
> > clearly wrong, and we added 35ff867b7657 ("PCI/IOV: Serialize sysfs
> > sriov_numvfs reads vs writes") to resolve it.
> >
> > We should try to avoid reintroducing the problem, so I think we should
> > probably squash these two patches and describe it as a deadlock fix
> > instead of dismissing 35ff867b7657 as being based on false premises.
> >
> > It would be awesome if you had time to verify that these patches also
> > resolve the problem you saw, Pierre.
> >
> > I think we should also add:
> >
> > Fixes: 35ff867b7657 ("PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes")
> >
> > as a trigger for backporting this to kernels that include
> > 35ff867b7657.
> >
> > Bjorn
> >
> > > > > This reverts commit 35ff867b76576e32f34c698ccd11343f7d616204.
> > > > > Revert had a small conflict, the sprintf() is now changed to sysfs_emit().
> > > > >
> > > > > Link: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/
> > > > > Suggested-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > Signed-off-by: Jim Harris <jim.harris@samsung.com>
> > > > > ---
> > > > > drivers/pci/iov.c | 8 +-------
> > > > > 1 file changed, 1 insertion(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > > > > index aaa33e8dc4c9..0ca20cd518d5 100644
> > > > > --- a/drivers/pci/iov.c
> > > > > +++ b/drivers/pci/iov.c
> > > > > @@ -395,14 +395,8 @@ static ssize_t sriov_numvfs_show(struct device *dev,
> > > > > char *buf)
> > > > > {
> > > > > struct pci_dev *pdev = to_pci_dev(dev);
> > > > > - u16 num_vfs;
> > > > > -
> > > > > - /* Serialize vs sriov_numvfs_store() so readers see valid num_VFs */
> > > > > - device_lock(&pdev->dev);
> > > > > - num_vfs = pdev->sriov->num_VFs;
> > > > > - device_unlock(&pdev->dev);
> > > > >
> > > > > - return sysfs_emit(buf, "%u\n", num_vfs);
> > > > > + return sysfs_emit(buf, "%u\n", pdev->sriov->num_VFs);
> > > > > }
> > > > >
> > > > > /*
> > > > >
> > > > --
> > > > Sathyanarayanan Kuppuswamy
> > > > Linux Kernel Developer
> > > >
next prev parent reply other threads:[~2024-02-13 7:37 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240209235208uscas1p26c658c64cc85711cd3aa6312224164fc@uscas1p2.samsung.com>
2024-02-09 23:52 ` [PATCH v2 0/2] PCI/IOV: sriov_numvfs bug fixes Jim Harris
2024-02-09 23:52 ` [PATCH v2 1/2] PCI/IOV: Revert "PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes" Jim Harris
2024-02-10 3:20 ` Kuppuswamy Sathyanarayanan
2024-02-11 8:48 ` Leon Romanovsky
2024-02-11 19:15 ` Kuppuswamy Sathyanarayanan
2024-02-12 9:31 ` Leon Romanovsky
2024-02-12 20:27 ` Bjorn Helgaas
2024-02-12 22:59 ` Jim Harris
2024-02-13 7:37 ` Leon Romanovsky [this message]
2024-02-13 9:40 ` pierre.cregut
2024-02-13 14:59 ` Jason Gunthorpe
2024-02-13 7:34 ` Leon Romanovsky
2024-02-13 15:59 ` Bjorn Helgaas
2024-02-13 17:46 ` Leon Romanovsky
2024-02-13 18:00 ` Kuppuswamy Sathyanarayanan
2024-02-13 19:45 ` Bjorn Helgaas
2024-02-14 7:16 ` Leon Romanovsky
2024-02-14 17:04 ` Jim Harris
2024-02-14 17:50 ` Bjorn Helgaas
2024-02-14 22:55 ` Jim Harris
2024-02-09 23:52 ` [PATCH v2 2/2] pci/iov: fix kobject_uevent() ordering in sriov_enable() Jim Harris
2024-02-10 3:22 ` Kuppuswamy Sathyanarayanan
2024-02-12 15:17 ` Keith Busch
2024-02-12 15:29 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240213073710.GB52640@unreal \
--to=leonro@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=helgaas@kernel.org \
--cc=jgg@nvidia.com \
--cc=jim.harris@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=pierre.cregut@orange.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.