From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jan Kiszka <jan.kiszka@web.de>, qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 13/13] qdev-properties: Add pci-devaddr property
Date: Sun, 10 Jun 2012 19:22:55 +0300 [thread overview]
Message-ID: <20120610162253.GA9879@redhat.com> (raw)
In-Reply-To: <1339343897.26976.263.camel@ul30vt>
On Sun, Jun 10, 2012 at 09:58:17AM -0600, Alex Williamson wrote:
> On Sun, 2012-06-10 at 18:37 +0300, Michael S. Tsirkin wrote:
> > On Sun, Jun 10, 2012 at 09:15:10AM -0600, Alex Williamson wrote:
> > > On Sun, 2012-06-10 at 17:54 +0300, Michael S. Tsirkin wrote:
> > > > On Sun, Jun 10, 2012 at 08:41:03AM -0600, Alex Williamson wrote:
> > > > > On Sun, 2012-06-10 at 17:03 +0300, Michael S. Tsirkin wrote:
> > > > > > On Sun, Jun 10, 2012 at 07:41:51AM -0600, Alex Williamson wrote:
> > > > > > > > > >>>> vfio_pci.c contains a nice function called "parse_hostaddr". You may
> > > > > > > > > >>>> guess what it does. ;)
> > > > > > > > > >>>
> > > > > > > > > >>> Interesting. Why? This looks strange to me:
> > > > > > > > > >>> I would expect the admin to bind a device to vfio
> > > > > > > > > >>> the way it's now bound to a stub.
> > > > > > > > > >>> The pass /dev/vfioXXX to qemu.
> > > > > > > > > >>
> > > > > > > > > >> That's the "libvirt way". We surely also want the "qemu command line
> > > > > > > > > >> way" for which this kind of service is needed.
> > > > > > > > > >>
> > > > > > > > > >> Jan
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > Yes, I imagine the qemu command line passing in /dev/vfioXXX,
> > > > > > > > > > the libvirt way will pass in an fd for above. No?
> > > > > > > > >
> > > > > > > > > As far as I understand the API, there is no device file per assigned
> > > > > > > > > device.
> > > > > > > >
> > > > > > > > Does it do pci_get_domain_bus_and_slot like kvm then?
> > > > > > > > With all the warts like you have to remember to bind pci stub
> > > > > > > > or you get two drivers for one device?
> > > > > > > > If true that's unfortunate IMHO.
> > > > > >
> > > > > > I hope the answer to the above is no?
> > > > >
> > > > > No, it does a probe for devices. We need the devaddr to compare against
> > > > > dev_name of the device to figure out which device the user is attempting
> > > > > to identify.
> > > > >
> > > > > > > > > Also, this [domain:]bus:dev.fn format is more handy for the
> > > > > > > > > command line.
> > > > > > > > >
> > > > > > > > > Jan
> > > > > > > > >
> > > > > > > >
> > > > > > > > Then users could add udev rules that will name vfio devices
> > > > > > > > like this. Another interesting option: /dev/vfio/eth0/vf1.
> > > > > > > > That's better I think: no one really likes running lspci
> > > > > > > > and guessing the address from there.
> > > > > > >
> > > > > > > That's not at all how VFIO works. /dev/vfio/# represents a group, which
> > > > > > > may contain one or more devices. Even if libvirt passes a file
> > > > > > > descriptor for the group, qemu needs to know which device in the group
> > > > > > > to add to the guest, so parsing a device address is still necessary.
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > >
> > > > > > That's very unusual, and unfortunate. For example this means that I
> > > > > > must update applications just because I move a card to another slot.
> > > > > > UIO does not have this problem.
> > > > > > The fact that it's broken in kvm ATM seems to have made people
> > > > > > think it's okay, but it really is a bug. We didn't fix it
> > > > > > because vfio was supposed to be the solution.
> > > > >
> > > > > I don't know what you're talking about here. Are you suggesting that
> > > > > needing to specify -device pci-assign,host=3.0 changing to host=4.0 when
> > > > > you move a card is broken?
> > > >
> > > > Yes. Absolutely. Admin should be able to abstract it away without users
> > > > knowing anything about it.
> > >
> > > We don't have UUIDs on PCI devices, so who's to say that the device that
> > > was in slot 3 is the same device that's now in slot 4 and the user
> > > should still have access to it? That sounds even more problematic.
> >
> > PF has a driver loaded so you can identify that, and
> > identify the VF through it. Again this is really policy,
> > it should be up to the admin how to name the device.
>
> Do PFs have a UUID? Some devices support a serial number, but that's
> not related to being a PF vs VF. We need to support both PFs and VFs
> regardless of whether they have any kind of UUID.
This is a solved problem. udev has class-specific ways to
get the device id and use it to find the name.
> I think we're inventing a problem though.
You think persistent names in udev were a solution in search of a
problem?
> > > > > How does UIO avoid such a problem.
> > > >
> > > > Normally you use a misc device that you can name with udev.
> > > >
> > > > > UIO-pci
> > > > > requires the user to use pci-sysfs for resource access, so it surely
> > > > > cares about the device address.
> > > >
> > > > Only uio_pci_generic. Other uio devices let you drive the
> > > > device.
> > >
> > > If this is actually a problem, this is the first ever complaint I've
> > > heard about it. As above, I don't think we can assume the same access
> > > when a device is moved.
> >
> > I thought need for sane naming and for sysfs interface was discussed
> > multiple times. But maybe I'm misremembering.
>
> There is sane naming and a sysfs interface...
It's not sane if the admin can't rename the device without breaking
applications.
> > > > > > I do realize you want to represent a group of devices somehow but can't
> > > > > > this be solved without breaking naming devices with udev? For example, the
> > > > > > device could be a file as well. You would then use the fd to identify the
> > > > > > device within the group. And in a somewhat common case of a single device
> > > > > > within the group, you can even make opening the group optional.
> > > > > > Don't know if this fix I suggest makes sense at all but it's a real
> > > > > > problem all the same.
> > > > >
> > > > > Unfortunately, exposing individual devices just confuses the ownership
> > > > > model we require for groups. It would provide the illusion of being
> > > > > able to assign an individual device, without the reality of the
> > > > > grouping. Groups are owned either by _a_ user or by the kernel, they
> > > > > can't be split across multiple users (at least not with any guarantees
> > > > > of isolation). The current interface makes this clear. Thanks,
> > > > >
> > > > > Alex
> > > >
> > > > So do users pass in group=/dev/vfio/1,host=0:3.0 then?
> > >
> > > No, vfio syntax is -device vfio-pci,host=0:3.0, just like pci-assign.
> > > Qemu will figure out which group that device belongs to and "do the
> > > right thing". If we add support for libvirt passing a groupfd, it will
> > > be mostly the same, just using scm_rights to get the groupfd instead of
> > > opening it directly. Thanks,
> > >
> > > Alex
> >
> > Then how do you know which /dev/vfio/# to open?
>
> This is all in the documentation patch... groups are exposed in sysfs
> in /sys/kernel/iommu_groups. Each group has a unique number which is
> exposed as a directory. Each group directory has a subdirectory called
> devices which links to all devices in the group. Each device within a
> group as an iommu_group link back to the group directory.
> The /dev/vfio/# entry matches the group number in sysfs. So it's all
> pretty easy. Thanks,
>
> Alex
So what's the problem to have devices in sysfs linked e.g. from
/sys/class/vfio/ ? udev could create the nodes e.g. in
/dev/vfio/devices/. User can then pass the device name and qemu can
figure out the group from sysfs.
--
MST
next prev parent reply other threads:[~2012-06-10 16:22 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-04 8:52 [Qemu-devel] [PATCH 00/13] pci: Cleanups & preparations for KVM device assignment Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 01/13] pci: Refactor pci_change_irq_level Jan Kiszka
2012-06-10 11:28 ` Michael S. Tsirkin
2012-06-04 8:52 ` [Qemu-devel] [PATCH 02/13] pci: Fold pci_bus_new_inplace into pci_bus_new Jan Kiszka
2012-06-07 12:51 ` Andreas Färber
2012-06-07 15:07 ` Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 03/13] pci: Introduce cached device INTx routing Jan Kiszka
2012-06-10 11:28 ` Michael S. Tsirkin
2012-06-04 8:52 ` [Qemu-devel] [PATCH 04/13] pci: Rename map_irq to route_pin Jan Kiszka
2012-06-10 11:28 ` Michael S. Tsirkin
2012-06-10 13:23 ` Michael S. Tsirkin
2012-06-04 8:52 ` [Qemu-devel] [PATCH 05/13] pci: Add pci_device_route_intx_to_irq Jan Kiszka
2012-06-07 14:32 ` Michael S. Tsirkin
2012-06-07 15:10 ` Jan Kiszka
2012-06-07 16:28 ` Michael S. Tsirkin
2012-06-07 16:46 ` Jan Kiszka
2012-06-07 16:55 ` Michael S. Tsirkin
2012-06-10 9:55 ` Michael S. Tsirkin
2012-06-10 10:08 ` Jan Kiszka
2012-06-10 10:41 ` Michael S. Tsirkin
2012-06-10 10:49 ` Jan Kiszka
2012-06-10 10:53 ` Michael S. Tsirkin
2012-06-10 14:19 ` Alex Williamson
2012-06-10 14:43 ` Michael S. Tsirkin
2012-06-10 15:25 ` Alex Williamson
2012-06-10 15:55 ` Michael S. Tsirkin
2012-06-10 16:30 ` Jan Kiszka
2012-06-10 16:50 ` Michael S. Tsirkin
2012-06-10 17:04 ` Michael S. Tsirkin
2012-06-04 8:52 ` [Qemu-devel] [PATCH 06/13] pci: Add INTx routing notifier Jan Kiszka
2012-06-07 13:14 ` Michael S. Tsirkin
2012-06-07 15:13 ` Jan Kiszka
2012-06-08 12:47 ` [Qemu-devel] [PATCH v2 " Jan Kiszka
2012-06-10 9:48 ` [Qemu-devel] [PATCH " Michael S. Tsirkin
2012-06-10 10:05 ` Jan Kiszka
2012-06-10 10:33 ` Michael S. Tsirkin
2012-06-10 10:44 ` Jan Kiszka
2012-06-10 11:11 ` Michael S. Tsirkin
2012-06-10 11:18 ` Jan Kiszka
2012-06-10 11:39 ` Michael S. Tsirkin
2012-06-10 12:09 ` Jan Kiszka
2012-06-10 12:16 ` Michael S. Tsirkin
2012-06-10 12:33 ` Jan Kiszka
2012-06-10 12:42 ` Michael S. Tsirkin
2012-06-10 12:47 ` Jan Kiszka
2012-06-10 13:19 ` Michael S. Tsirkin
2012-06-10 12:32 ` Michael S. Tsirkin
2012-06-04 8:52 ` [Qemu-devel] [PATCH 07/13] pci: Make domain and bus unsigned in pci_read_devaddr Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 08/13] pci: Export pci_parse_devaddr instead of pci_read_devaddr Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 09/13] pci: Introduce and apply PCIDeviceAddress Jan Kiszka
2012-06-10 9:37 ` Michael S. Tsirkin
2012-06-10 10:10 ` Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 10/13] pci: Fix coding style of pci_parse_devaddr Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 11/13] Move pci_parse_devaddr to qdev-properties Jan Kiszka
2012-06-07 12:57 ` Andreas Färber
2012-06-07 15:11 ` Jan Kiszka
2012-06-07 15:56 ` Andreas Färber
2012-06-08 10:57 ` Jan Kiszka
2012-06-08 12:03 ` Andreas Färber
2012-06-08 12:14 ` Jan Kiszka
2012-06-08 12:18 ` Andreas Färber
2012-06-08 12:45 ` Jan Kiszka
2012-06-08 14:17 ` Michael S. Tsirkin
2012-06-08 14:20 ` Anthony Liguori
2012-06-08 13:55 ` Anthony Liguori
2012-06-08 19:21 ` Andreas Färber
2012-06-04 8:52 ` [Qemu-devel] [PATCH 12/13] qdev-properties: Use qemu_parse_pci_devaddr for pci-devfn property Jan Kiszka
2012-06-04 8:52 ` [Qemu-devel] [PATCH 13/13] qdev-properties: Add pci-devaddr property Jan Kiszka
2012-06-10 9:35 ` Michael S. Tsirkin
2012-06-10 10:14 ` Jan Kiszka
2012-06-10 10:49 ` Michael S. Tsirkin
2012-06-10 10:52 ` Jan Kiszka
2012-06-10 10:58 ` Michael S. Tsirkin
2012-06-10 11:00 ` Jan Kiszka
2012-06-10 11:17 ` Michael S. Tsirkin
2012-06-10 11:25 ` Jan Kiszka
2012-06-10 12:01 ` Michael S. Tsirkin
2012-06-10 13:41 ` Alex Williamson
2012-06-10 14:03 ` Michael S. Tsirkin
2012-06-10 14:41 ` Alex Williamson
2012-06-10 14:54 ` Michael S. Tsirkin
2012-06-10 15:15 ` Alex Williamson
2012-06-10 15:37 ` Michael S. Tsirkin
2012-06-10 15:58 ` Alex Williamson
2012-06-10 16:22 ` Michael S. Tsirkin [this message]
2012-06-10 17:29 ` Alex Williamson
2012-06-10 17:57 ` Michael S. Tsirkin
2012-06-10 13:49 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120610162253.GA9879@redhat.com \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=jan.kiszka@web.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).