From: Yuval Shaia <yuval.shaia@oracle.com> To: Hannes Reinecke <hare@suse.de> Cc: Cornelia Huck <cohuck@redhat.com>, mst@redhat.com, linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, jgg@mellanox.com Subject: Re: [Qemu-devel] [RFC 0/3] VirtIO RDMA Date: Tue, 30 Apr 2019 15:16:25 +0300 [thread overview] Message-ID: <20190430121624.GA8708@lap1> (raw) In-Reply-To: <e73e03c2-ea2b-6ffc-cd23-e8e44d42ce80@suse.de> On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote: > On 4/15/19 12:35 PM, Yuval Shaia wrote: > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote: > > > On Thu, 11 Apr 2019 14:01:54 +0300 > > > Yuval Shaia <yuval.shaia@oracle.com> wrote: > > > > > > > Data center backends use more and more RDMA or RoCE devices and more and > > > > more software runs in virtualized environment. > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines. > > > > > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton > > > > technology and also because the Virtio specification > > > > allows Hardware Vendors to support Virtio protocol natively in order to > > > > achieve bare metal performance. > > > > > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE > > > > Virtio Specification and a look forward on possible implementation > > > > techniques. > > > > > > > > Open issues/Todo list: > > > > List is huge, this is only start point of the project. > > > > Anyway, here is one example of item in the list: > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that > > > > in order to support for example 32K QPs we will need 64K VirtQ. Not sure > > > > that this is reasonable so one option is to have one for all and > > > > multiplex the traffic on it. This is not good approach as by design it > > > > introducing an optional starvation. Another approach would be multi > > > > queues and round-robin (for example) between them. > > > > > Typically there will be a one-to-one mapping between QPs and CPUs (on the > guest). So while one would need to be prepared to support quite some QPs, > the expectation is that the actual number of QPs used will be rather low. > In a similar vein, multiplexing QPs would be defeating the purpose, as the > overall idea was to have _independent_ QPs to enhance parallelism. Since Jason already addresses the issue then i'll skip it. > > > > > Expectations from this posting: > > > > In general, any comment is welcome, starting from hey, drop this as it is a > > > > very bad idea, to yeah, go ahead, we really want it. > > > > Idea here is that since it is not a minor effort i first want to know if > > > > there is some sort interest in the community for such device. > > > > > > My first reaction is: Sounds sensible, but it would be good to have a > > > spec for this :) > > > > > > You'll need a spec if you want this to go forward anyway, so at least a > > > sketch would be good to answer questions such as how many virtqueues > > > you use for which purpose, what is actually put on the virtqueues, > > > whether there are negotiable features, and what the expectations for > > > the device and the driver are. It also makes it easier to understand > > > how this is supposed to work in practice. > > > > > > If folks agree that this sounds useful, the next step would be to > > > reserve an id for the device type. > > > > Thanks for the tips, will sure do that, it is that first i wanted to make > > sure there is a use case here. > > > > Waiting for any feedback from the community. > > > I really do like the ides; in fact, it saved me from coding a similar thing > myself :-) Isn't it the great thing with open source :-) > > However, I'm still curious about the overall intent of this driver. Where > would the I/O be routed _to_ ? > It's nice that we have a virtualized driver, but this driver is > intended to do I/O (even if it doesn't _do_ any I/O ATM :-) > And this I/O needs to be send to (and possibly received from) > something. Idea is to have a virtio-rdma device emulation (patch #2) on host that will relay the traffic to the real HW on host. It will be good to have design that will allow Virtio-HW to be plugged to the host and use the same driver. In this case the emulated device would not be needed - the driver will "attach" to the Virtqueue exposed by the virtio-HW instead of the emulated RDMA device. I don't know of any public virtio-rdma HW. > > So what exactly is this something? > An existing piece of HW on the host? > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or > by using virtio-mdev? vfio needs to be implemented by every HW vendor where this approach is a generic one that is not depended on the HW. SV-IOV has it's limitations. And with virtio-mdev, sorry but do not know, can you elaborate more? > > Another guest? No > If so, how would we route the I/O from one guest to the other? > Shared memory? Implementing a full-blown RDMA switch in qemu? > > Oh, and I would _love_ to have a discussion about this at KVM Forum. > Maybe I'll manage to whip up guest-to-guest RDMA connection using ivshmem > ... let's see. Well, I've posted a proposal for a talk, lets see if it'll be accepted. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@suse.de +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg)
WARNING: multiple messages have this Message-ID (diff)
From: Yuval Shaia <yuval.shaia@oracle.com> To: Hannes Reinecke <hare@suse.de> Cc: mst@redhat.com, linux-rdma@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, jgg@mellanox.com Subject: Re: [Qemu-devel] [RFC 0/3] VirtIO RDMA Date: Tue, 30 Apr 2019 15:16:25 +0300 [thread overview] Message-ID: <20190430121624.GA8708@lap1> (raw) Message-ID: <20190430121625.qMerMF2FqLtsJ82eS_hc0v36t9jvldaSackwPs0KC24@z> (raw) In-Reply-To: <e73e03c2-ea2b-6ffc-cd23-e8e44d42ce80@suse.de> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 5274 bytes --] On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote: > On 4/15/19 12:35 PM, Yuval Shaia wrote: > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote: > > > On Thu, 11 Apr 2019 14:01:54 +0300 > > > Yuval Shaia <yuval.shaia@oracle.com> wrote: > > > > > > > Data center backends use more and more RDMA or RoCE devices and more and > > > > more software runs in virtualized environment. > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines. > > > > > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton > > > > technology and also because the Virtio specification > > > > allows Hardware Vendors to support Virtio protocol natively in order to > > > > achieve bare metal performance. > > > > > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE > > > > Virtio Specification and a look forward on possible implementation > > > > techniques. > > > > > > > > Open issues/Todo list: > > > > List is huge, this is only start point of the project. > > > > Anyway, here is one example of item in the list: > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that > > > > in order to support for example 32K QPs we will need 64K VirtQ. Not sure > > > > that this is reasonable so one option is to have one for all and > > > > multiplex the traffic on it. This is not good approach as by design it > > > > introducing an optional starvation. Another approach would be multi > > > > queues and round-robin (for example) between them. > > > > > Typically there will be a one-to-one mapping between QPs and CPUs (on the > guest). So while one would need to be prepared to support quite some QPs, > the expectation is that the actual number of QPs used will be rather low. > In a similar vein, multiplexing QPs would be defeating the purpose, as the > overall idea was to have _independent_ QPs to enhance parallelism. Since Jason already addresses the issue then i'll skip it. > > > > > Expectations from this posting: > > > > In general, any comment is welcome, starting from hey, drop this as it is a > > > > very bad idea, to yeah, go ahead, we really want it. > > > > Idea here is that since it is not a minor effort i first want to know if > > > > there is some sort interest in the community for such device. > > > > > > My first reaction is: Sounds sensible, but it would be good to have a > > > spec for this :) > > > > > > You'll need a spec if you want this to go forward anyway, so at least a > > > sketch would be good to answer questions such as how many virtqueues > > > you use for which purpose, what is actually put on the virtqueues, > > > whether there are negotiable features, and what the expectations for > > > the device and the driver are. It also makes it easier to understand > > > how this is supposed to work in practice. > > > > > > If folks agree that this sounds useful, the next step would be to > > > reserve an id for the device type. > > > > Thanks for the tips, will sure do that, it is that first i wanted to make > > sure there is a use case here. > > > > Waiting for any feedback from the community. > > > I really do like the ides; in fact, it saved me from coding a similar thing > myself :-) Isn't it the great thing with open source :-) > > However, I'm still curious about the overall intent of this driver. Where > would the I/O be routed _to_ ? > It's nice that we have a virtualized driver, but this driver is > intended to do I/O (even if it doesn't _do_ any I/O ATM :-) > And this I/O needs to be send to (and possibly received from) > something. Idea is to have a virtio-rdma device emulation (patch #2) on host that will relay the traffic to the real HW on host. It will be good to have design that will allow Virtio-HW to be plugged to the host and use the same driver. In this case the emulated device would not be needed - the driver will "attach" to the Virtqueue exposed by the virtio-HW instead of the emulated RDMA device. I don't know of any public virtio-rdma HW. > > So what exactly is this something? > An existing piece of HW on the host? > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or > by using virtio-mdev? vfio needs to be implemented by every HW vendor where this approach is a generic one that is not depended on the HW. SV-IOV has it's limitations. And with virtio-mdev, sorry but do not know, can you elaborate more? > > Another guest? No > If so, how would we route the I/O from one guest to the other? > Shared memory? Implementing a full-blown RDMA switch in qemu? > > Oh, and I would _love_ to have a discussion about this at KVM Forum. > Maybe I'll manage to whip up guest-to-guest RDMA connection using ivshmem > ... let's see. Well, I've posted a proposal for a talk, lets see if it'll be accepted. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@suse.de +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg)
next prev parent reply other threads:[~2019-04-30 12:16 UTC|newest] Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-11 11:01 [Qemu-devel] [RFC 0/3] VirtIO RDMA Yuval Shaia 2019-04-11 11:01 ` Yuval Shaia 2019-04-11 11:01 ` [Qemu-devel] [RFC 1/3] virtio-net: Move some virtio-net-pci decl to include/hw/virtio Yuval Shaia 2019-04-11 11:01 ` Yuval Shaia 2019-04-11 11:01 ` [Qemu-devel] [RFC 2/3] hw/virtio-rdma: VirtIO rdma device Yuval Shaia 2019-04-11 11:01 ` Yuval Shaia 2019-04-19 23:20 ` Michael S. Tsirkin 2019-04-19 23:20 ` Michael S. Tsirkin 2019-04-23 7:59 ` Cornelia Huck 2019-04-23 7:59 ` Cornelia Huck 2019-04-11 11:01 ` [Qemu-devel] [RFC 3/3] RDMA/virtio-rdma: VirtIO rdma driver Yuval Shaia 2019-04-11 11:01 ` Yuval Shaia 2019-04-13 7:58 ` Yanjun Zhu 2019-04-13 7:58 ` Yanjun Zhu 2019-04-14 5:20 ` Yuval Shaia 2019-04-14 5:20 ` Yuval Shaia 2019-04-16 1:07 ` Bart Van Assche 2019-04-16 1:07 ` Bart Van Assche 2019-04-16 8:56 ` Yuval Shaia 2019-04-16 8:56 ` Yuval Shaia 2019-04-11 17:02 ` [Qemu-devel] [RFC 0/3] VirtIO RDMA Cornelia Huck 2019-04-11 17:02 ` Cornelia Huck 2019-04-11 17:24 ` Jason Gunthorpe 2019-04-11 17:24 ` Jason Gunthorpe 2019-04-11 17:34 ` Yuval Shaia 2019-04-11 17:34 ` Yuval Shaia 2019-04-11 17:40 ` Jason Gunthorpe 2019-04-11 17:40 ` Jason Gunthorpe 2019-04-15 10:04 ` Yuval Shaia 2019-04-15 10:04 ` Yuval Shaia 2019-04-11 17:41 ` Yuval Shaia 2019-04-11 17:41 ` Yuval Shaia 2019-04-12 9:51 ` Devesh Sharma 2019-04-12 9:51 ` Devesh Sharma via Qemu-devel 2019-04-15 10:27 ` Yuval Shaia 2019-04-15 10:27 ` Yuval Shaia 2019-04-15 10:35 ` Yuval Shaia 2019-04-15 10:35 ` Yuval Shaia 2019-04-19 11:16 ` Hannes Reinecke 2019-04-19 11:16 ` Hannes Reinecke 2019-04-22 6:00 ` Leon Romanovsky 2019-04-22 6:00 ` Leon Romanovsky 2019-04-30 17:16 ` Yuval Shaia 2019-04-30 17:16 ` Yuval Shaia 2019-04-22 16:45 ` Jason Gunthorpe 2019-04-22 16:45 ` Jason Gunthorpe 2019-04-30 17:13 ` Yuval Shaia 2019-04-30 17:13 ` Yuval Shaia 2019-05-07 19:43 ` Jason Gunthorpe 2019-04-30 12:16 ` Yuval Shaia [this message] 2019-04-30 12:16 ` Yuval Shaia
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190430121624.GA8708@lap1 \ --to=yuval.shaia@oracle.com \ --cc=cohuck@redhat.com \ --cc=hare@suse.de \ --cc=jgg@mellanox.com \ --cc=linux-rdma@vger.kernel.org \ --cc=mst@redhat.com \ --cc=qemu-devel@nongnu.org \ --cc=virtualization@lists.linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).