From: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
virtualization-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Eric Van Hensbergen
<ericvanhensbergen-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 3/3] virtio PCI device
Date: Tue, 20 Nov 2007 16:16:43 -0600 [thread overview]
Message-ID: <47435CCB.1050506@us.ibm.com> (raw)
In-Reply-To: <4743076F.8000105-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>
>>> Anthony Liguori wrote:
>>>
>>>> This is a PCI device that implements a transport for virtio. It
>>>> allows virtio
>>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>>
>>>> +
>>>> +/* the notify function used when creating a virt queue */
>>>> +static void vp_notify(struct virtqueue *vq)
>>>> +{
>>>> + struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>> + struct virtio_pci_vq_info *info = vq->priv;
>>>> +
>>>> + /* we write the queue's selector into the notification
>>>> register to
>>>> + * signal the other end */
>>>> + iowrite16(info->queue_index, vp_dev->ioaddr +
>>>> VIRTIO_PCI_QUEUE_NOTIFY);
>>>> +}
>>>>
>>> This means we can't kick multiple queues with one exit.
>>>
>>
>> There is no interface in virtio currently to batch multiple queue
>> notifications so the only way one could do this AFAICT is to use a
>> timer to delay the notifications. Were you thinking of something else?
>>
>>
>
> No. We can change virtio though, so let's have a flexible ABI.
Well please propose the virtio API first and then I'll adjust the PCI
ABI. I don't want to build things into the ABI that we never actually
end up using in virtio :-)
>>> I'd also like to see a hypercall-capable version of this (but that
>>> can wait).
>>>
>>
>> That can be a different device.
>>
>
> That means the user has to select which device to expose. With
> feature bits, the hypervisor advertises both pio and hypercalls, the
> guest picks whatever it wants.
I was thinking more along the lines that a hypercall-based device would
certainly be implemented in-kernel whereas the current device is
naturally implemented in userspace. We can simply use a different
device for in-kernel drivers than for userspace drivers. There's no
point at all in doing a hypercall based userspace device IMHO.
>> I don't think so. A vmexit is required to lower the IRQ line. It
>> may be possible to do something clever like set a shared memory value
>> that's checked on every vmexit. I think it's very unlikely that it's
>> worth it though.
>>
>
> Why so unlikely? Not all workloads will have good batching.
It's pretty invasive. I think a more paravirt device that expected an
edge triggered interrupt would be a better solution for those types of
devices.
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/* the config->find_vq() implementation */
>>>> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev,
>>>> unsigned index,
>>>> + bool (*callback)(struct virtqueue *vq))
>>>> +{
>>>> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>> + struct virtio_pci_vq_info *info;
>>>> + struct virtqueue *vq;
>>>> + int err;
>>>> + u16 num;
>>>> +
>>>> + /* Select the queue we're interested in */
>>>> + iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>>>
>>> I would really like to see this implemented as pci config space,
>>> with no tricks like multiplexing several virtqueues on one register.
>>> Something like the PCI BARs where you have all the register numbers
>>> allocated statically to queues.
>>>
>>
>> My first implementation did that. I switched to using a selector
>> because it reduces the amount of PCI config space used and does not
>> limit the number of queues defined by the ABI as much.
>>
>
> But... it's tricky, and it's nonstandard. With pci config, you can do
> live migration by shipping the pci config space to the other side.
> With the special iospace, you need to encode/decode it.
None of the PCI devices currently work like that in QEMU. It would be
very hard to make a device that worked this way because since the order
in which values are written matter a whole lot. For instance, if you
wrote the status register before the queue information, the driver could
get into a funky state.
We'll still need save/restore routines for virtio devices. I don't
really see this as a problem since we do this for every other device.
> Not much of an argument, I know.
>
>
> wrt. number of queues, 8 queues will consume 32 bytes of pci space if
> all you store is the ring pfn.
You also at least need a num argument which takes you to 48 or 64
depending on whether you care about strange formatting. 8 queues may
not be enough either. Eric and I have discussed whether the 9p virtio
device should support multiple mounts per-virtio device and if so,
whether each one should have it's own queue. Any devices that supports
this sort of multiplexing will very quickly start using a lot of queues.
I think most types of hardware have some notion of a selector or mode.
Take a look at the LSI adapter or even VGA.
Regards,
Anthony Liguori
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
WARNING: multiple messages have this Message-ID (diff)
From: Anthony Liguori <aliguori@us.ibm.com>
To: Avi Kivity <avi@qumranet.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
virtualization@lists.osdl.org, linux-kernel@vger.kernel.org,
kvm-devel@lists.sourceforge.net,
Eric Van Hensbergen <ericvanhensbergen@us.ibm.com>
Subject: Re: [kvm-devel] [PATCH 3/3] virtio PCI device
Date: Tue, 20 Nov 2007 16:16:43 -0600 [thread overview]
Message-ID: <47435CCB.1050506@us.ibm.com> (raw)
In-Reply-To: <4743076F.8000105@qumranet.com>
Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>
>>> Anthony Liguori wrote:
>>>
>>>> This is a PCI device that implements a transport for virtio. It
>>>> allows virtio
>>>> devices to be used by QEMU based VMMs like KVM or Xen.
>>>>
>>>> +
>>>> +/* the notify function used when creating a virt queue */
>>>> +static void vp_notify(struct virtqueue *vq)
>>>> +{
>>>> + struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
>>>> + struct virtio_pci_vq_info *info = vq->priv;
>>>> +
>>>> + /* we write the queue's selector into the notification
>>>> register to
>>>> + * signal the other end */
>>>> + iowrite16(info->queue_index, vp_dev->ioaddr +
>>>> VIRTIO_PCI_QUEUE_NOTIFY);
>>>> +}
>>>>
>>> This means we can't kick multiple queues with one exit.
>>>
>>
>> There is no interface in virtio currently to batch multiple queue
>> notifications so the only way one could do this AFAICT is to use a
>> timer to delay the notifications. Were you thinking of something else?
>>
>>
>
> No. We can change virtio though, so let's have a flexible ABI.
Well please propose the virtio API first and then I'll adjust the PCI
ABI. I don't want to build things into the ABI that we never actually
end up using in virtio :-)
>>> I'd also like to see a hypercall-capable version of this (but that
>>> can wait).
>>>
>>
>> That can be a different device.
>>
>
> That means the user has to select which device to expose. With
> feature bits, the hypervisor advertises both pio and hypercalls, the
> guest picks whatever it wants.
I was thinking more along the lines that a hypercall-based device would
certainly be implemented in-kernel whereas the current device is
naturally implemented in userspace. We can simply use a different
device for in-kernel drivers than for userspace drivers. There's no
point at all in doing a hypercall based userspace device IMHO.
>> I don't think so. A vmexit is required to lower the IRQ line. It
>> may be possible to do something clever like set a shared memory value
>> that's checked on every vmexit. I think it's very unlikely that it's
>> worth it though.
>>
>
> Why so unlikely? Not all workloads will have good batching.
It's pretty invasive. I think a more paravirt device that expected an
edge triggered interrupt would be a better solution for those types of
devices.
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/* the config->find_vq() implementation */
>>>> +static struct virtqueue *vp_find_vq(struct virtio_device *vdev,
>>>> unsigned index,
>>>> + bool (*callback)(struct virtqueue *vq))
>>>> +{
>>>> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>> + struct virtio_pci_vq_info *info;
>>>> + struct virtqueue *vq;
>>>> + int err;
>>>> + u16 num;
>>>> +
>>>> + /* Select the queue we're interested in */
>>>> + iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>>>
>>> I would really like to see this implemented as pci config space,
>>> with no tricks like multiplexing several virtqueues on one register.
>>> Something like the PCI BARs where you have all the register numbers
>>> allocated statically to queues.
>>>
>>
>> My first implementation did that. I switched to using a selector
>> because it reduces the amount of PCI config space used and does not
>> limit the number of queues defined by the ABI as much.
>>
>
> But... it's tricky, and it's nonstandard. With pci config, you can do
> live migration by shipping the pci config space to the other side.
> With the special iospace, you need to encode/decode it.
None of the PCI devices currently work like that in QEMU. It would be
very hard to make a device that worked this way because since the order
in which values are written matter a whole lot. For instance, if you
wrote the status register before the queue information, the driver could
get into a funky state.
We'll still need save/restore routines for virtio devices. I don't
really see this as a problem since we do this for every other device.
> Not much of an argument, I know.
>
>
> wrt. number of queues, 8 queues will consume 32 bytes of pci space if
> all you store is the ring pfn.
You also at least need a num argument which takes you to 48 or 64
depending on whether you care about strange formatting. 8 queues may
not be enough either. Eric and I have discussed whether the 9p virtio
device should support multiple mounts per-virtio device and if so,
whether each one should have it's own queue. Any devices that supports
this sort of multiplexing will very quickly start using a lot of queues.
I think most types of hardware have some notion of a selector or mode.
Take a look at the LSI adapter or even VGA.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2007-11-20 22:16 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-08 2:46 [PATCH 0/3] virtio PCI driver Anthony Liguori
[not found] ` <11944899922822-git-send-email-aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 2:46 ` [PATCH 1/3] Export vring functions for modules to use Anthony Liguori
2007-11-08 2:46 ` Anthony Liguori
[not found] ` <11944900141678-git-send-email-aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 2:46 ` [PATCH 2/3] Put the virtio under the virtualization menu Anthony Liguori
2007-11-08 2:46 ` Anthony Liguori
[not found] ` <11944900152750-git-send-email-aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 2:46 ` [PATCH 3/3] virtio PCI device Anthony Liguori
2007-11-08 2:46 ` Anthony Liguori
[not found] ` <11944900163817-git-send-email-aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 6:12 ` Avi Kivity
2007-11-08 6:12 ` [kvm-devel] " Avi Kivity
[not found] ` <4732A8E5.6090307-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-08 13:54 ` Anthony Liguori
2007-11-08 13:54 ` [kvm-devel] " Anthony Liguori
[not found] ` <47331531.8070709-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 14:37 ` Avi Kivity
2007-11-08 14:37 ` [kvm-devel] " Avi Kivity
[not found] ` <47331F47.70304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-08 15:06 ` Anthony Liguori
2007-11-08 15:06 ` [kvm-devel] " Anthony Liguori
[not found] ` <473325EB.5090907-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-08 15:13 ` Avi Kivity
2007-11-08 15:13 ` [kvm-devel] " Avi Kivity
2007-11-08 23:43 ` Dor Laor
2007-11-08 23:43 ` [kvm-devel] " Dor Laor
2007-11-08 15:24 ` Arnd Bergmann
2007-11-08 17:46 ` Arnd Bergmann
2007-11-08 17:46 ` [kvm-devel] " Arnd Bergmann
[not found] ` <200711081846.36821.arnd-r2nGTMty4D4@public.gmane.org>
2007-11-08 19:04 ` Anthony Liguori
2007-11-08 19:04 ` [kvm-devel] " Anthony Liguori
[not found] ` <47335DC6.7090603-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-09 11:03 ` Arnd Bergmann
2007-11-09 11:03 ` [kvm-devel] " Arnd Bergmann
2007-11-09 0:39 ` Dor Laor
2007-11-09 0:39 ` [kvm-devel] " Dor Laor
[not found] ` <4733AC3A.20701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-09 2:17 ` Anthony Liguori
2007-11-09 2:17 ` [kvm-devel] " Anthony Liguori
2007-11-20 15:01 ` Avi Kivity
2007-11-20 15:43 ` Anthony Liguori
2007-11-20 15:43 ` Anthony Liguori
[not found] ` <474300AD.4060509-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-20 16:12 ` Avi Kivity
2007-11-20 16:12 ` [kvm-devel] " Avi Kivity
[not found] ` <4743076F.8000105-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-20 22:16 ` Anthony Liguori [this message]
2007-11-20 22:16 ` Anthony Liguori
[not found] ` <47435CCB.1050506-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-21 7:13 ` Avi Kivity
2007-11-21 7:13 ` [kvm-devel] " Avi Kivity
[not found] ` <4743DAA4.70800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-21 18:22 ` Zachary Amsden
2007-11-21 18:22 ` [kvm-devel] " Zachary Amsden
[not found] ` <1195669377.6352.247.camel-cxY/u30q8FloTgUnLF1by8fTvwmfpRNyZeezCHUQhQ4@public.gmane.org>
2007-11-22 7:32 ` Avi Kivity
2007-11-22 7:32 ` [kvm-devel] " Avi Kivity
2007-11-23 16:51 ` Anthony Liguori
2007-11-23 16:51 ` Anthony Liguori
[not found] ` <4747051C.3090903-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-23 17:47 ` Avi Kivity
2007-11-23 17:47 ` [kvm-devel] " Avi Kivity
2007-11-26 19:18 ` Anthony Liguori
[not found] ` <474B1BF3.20901-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-27 9:02 ` Avi Kivity
2007-11-27 9:02 ` [kvm-devel] " Avi Kivity
[not found] ` <474BDD28.7050801-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-27 9:09 ` Carsten Otte
2007-11-27 9:09 ` [kvm-devel] " Carsten Otte
[not found] ` <474BDEDE.6060603-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-11-27 9:27 ` Avi Kivity
2007-11-27 9:27 ` Avi Kivity
[not found] ` <474BE319.502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-27 10:12 ` Carsten Otte
2007-11-27 10:12 ` [kvm-devel] " Carsten Otte
[not found] ` <474BEDAB.3000305-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-11-27 10:19 ` Avi Kivity
2007-11-27 10:19 ` [kvm-devel] " Avi Kivity
[not found] ` <474BEF28.9010005-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-27 10:28 ` Carsten Otte
2007-11-27 10:28 ` [kvm-devel] " Carsten Otte
[not found] ` <474BF157.3080709-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-11-27 13:27 ` Dor Laor
2007-11-27 9:25 ` Arnd Bergmann
2007-11-27 9:25 ` Arnd Bergmann
2007-11-27 9:25 ` [kvm-devel] " Arnd Bergmann
2007-11-08 6:49 ` [PATCH 2/3] Put the virtio under the virtualization menu Avi Kivity
2007-11-08 6:49 ` [kvm-devel] " Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47435CCB.1050506@us.ibm.com \
--to=aliguori-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=ericvanhensbergen-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=virtualization-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.