From: Alex Williamson <alex.williamson@redhat.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] vfio-pci: PCI hot reset interface
Date: Mon, 19 Aug 2013 12:41:22 -0600 [thread overview]
Message-ID: <1376937682.2657.15.camel@ul30vt.home> (raw)
In-Reply-To: <1376521578.13642.65.camel@ul30vt.home>
On Wed, 2013-08-14 at 17:06 -0600, Alex Williamson wrote:
> On Wed, 2013-08-14 at 16:42 -0600, Bjorn Helgaas wrote:
> > [+cc Al, linux-fsdevel for fdget/fdput usage]
> >
> > On Wed, Aug 14, 2013 at 2:10 PM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > > The current VFIO_DEVICE_RESET interface only maps to PCI use cases
> > > where we can isolate the reset to the individual PCI function. This
> > > means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
> > > transition, device specific reset, or be a singleton device on a bus
> > > for a secondary bus reset. FLR does not have widespread support,
> > > PM reset is not very reliable, and bus topology is dictated by the
> > > system and device design. We need to provide a means for a user to
> > > induce a bus reset in cases where the existing mechanisms are not
> > > available or not reliable.
> > >
> > > This device specific extension to VFIO provides the user with this
> > > ability. Two new ioctls are introduced:
> > > - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
> > > - VFIO_DEVICE_PCI_HOT_RESET
> > >
> > > The first provides the user with information about the extent of
> > > devices affected by a hot reset. This is essentially a list of
> > > devices and the IOMMU groups they belong to. The user may then
> > > initiate a hot reset by calling the second ioctl. We must be
> > > careful that the user has ownership of all the affected devices
> > > found via the first ioctl, so the second ioctl takes a list of file
> > > descriptors for the VFIO groups affected by the reset. Each group
> > > must have IOMMU protection established for the ioctl to succeed.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >
> > > This patch is dependent on v5 "pci: bus and slot reset interfaces" as
> > > well as "pci: Add probe functions for bus and slot reset".
> > >
> > > drivers/vfio/pci/vfio_pci.c | 272 +++++++++++++++++++++++++++++++++++++++++++
> > > include/uapi/linux/vfio.h | 38 ++++++
> > > 2 files changed, 309 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index cef6002..eb69bf3 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -227,6 +227,97 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
> > > return 0;
> > > }
> > >
> > > +static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> > > +{
> > > + (*(int *)data)++;
> > > + return 0;
> > > +}
> > > +
> > > +struct vfio_pci_fill_info {
> > > + int max;
> > > + int cur;
> > > + struct vfio_pci_dependent_device *devices;
> > > +};
> > > +
> > > +static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > > +{
> > > + struct vfio_pci_fill_info *info = data;
> > > + struct iommu_group *iommu_group;
> > > +
> > > + if (info->cur == info->max)
> > > + return -EAGAIN; /* Something changed, try again */
> > > +
> > > + iommu_group = iommu_group_get(&pdev->dev);
> > > + if (!iommu_group)
> > > + return -EPERM; /* Cannot reset non-isolated devices */
> > > +
> > > + info->devices[info->cur].group_id = iommu_group_id(iommu_group);
> > > + info->devices[info->cur].segment = pci_domain_nr(pdev->bus);
> > > + info->devices[info->cur].bus = pdev->bus->number;
> > > + info->devices[info->cur].devfn = pdev->devfn;
> > > + info->cur++;
> > > + iommu_group_put(iommu_group);
> > > + return 0;
> > > +}
> > > +
> > > +struct vfio_pci_group {
> > > + struct vfio_group *group;
> > > + int id;
> > > +};
> > > +
> > > +struct vfio_pci_group_info {
> > > + int count;
> > > + struct vfio_pci_group *groups;
> > > +};
> > > +
> > > +static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data)
> > > +{
> > > + struct vfio_pci_group_info *info = data;
> > > + struct iommu_group *group;
> > > + int id, i;
> > > +
> > > + group = iommu_group_get(&pdev->dev);
> > > + if (!group)
> > > + return -EPERM;
> > > +
> > > + id = iommu_group_id(group);
> > > +
> > > + for (i = 0; i < info->count; i++)
> > > + if (info->groups[i].id == id)
> > > + break;
> > > +
> > > + iommu_group_put(group);
> > > +
> > > + return (i == info->count) ? -EINVAL : 0;
> > > +}
> > > +
> > > +static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
> > > + int (*fn)(struct pci_dev *,
> > > + void *data), void *data,
> > > + bool slot)
> > > +{
> > > + struct pci_dev *tmp;
> > > + int ret = 0;
> > > +
> > > + list_for_each_entry(tmp, &pdev->bus->devices, bus_list) {
> > > + if (slot && tmp->slot != pdev->slot)
> > > + continue;
> > > +
> > > + ret = fn(tmp, data);
> > > + if (ret)
> > > + break;
> > > +
> > > + if (tmp->subordinate) {
> > > + ret = vfio_pci_for_each_slot_or_bus(tmp, fn,
> > > + data, false);
> > > + if (ret)
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > +}
> >
> > vfio_pci_for_each_slot_or_bus() isn't really vfio-specific, is it?
>
> It's not, I originally has callbacks split out as PCI patches but I was
> able to simplify some things in the code by customizing it to my usage,
> so I left it here.
>
> > I mean, traversing the PCI hierarchy doesn't require vfio knowledge. I
> > think this loop (walking the bus->devices list) skips devices on
> > "virtual buses" that may be added for SR-IOV. I'm not sure that
> > pci_walk_bus() handles that correctly either, but at least if you used
> > that, we could fix the problem in one place.
>
> I didn't know about pci_walk_bus(), I'll look into switching to it.
It looks like pci_walk_bus() is a poor replacement for when dealing with
slots. There might be multiple slots on a bus or a mix of slots and
non-slots, so for each device pci_walk_bus() finds on a subordinate bus
I'd need to walk up the tree to find the parent bridge on the original
bus to figure out if it's in the same slot. Should we have a
pci_walk_slot() function? Thanks,
Alex
next prev parent reply other threads:[~2013-08-19 18:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130814200845.21923.64284.stgit@bling.home>
2013-08-14 22:42 ` [PATCH] vfio-pci: PCI hot reset interface Bjorn Helgaas
2013-08-14 23:06 ` Alex Williamson
2013-08-19 18:41 ` Alex Williamson [this message]
2013-08-19 20:02 ` Bjorn Helgaas
2013-08-19 20:20 ` Alex Williamson
2013-08-19 22:44 ` Benjamin Herrenschmidt
2013-08-19 23:02 ` Alex Williamson
2013-08-19 22:42 ` Benjamin Herrenschmidt
2013-08-19 22:59 ` Alex Williamson
2013-08-19 23:52 ` Benjamin Herrenschmidt
2013-08-20 3:18 ` Al Viro
2013-08-20 3:53 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1376937682.2657.15.camel@ul30vt.home \
--to=alex.williamson@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).