qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Yuval Shaia <yuval.shaia@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Hannes Reinecke <hare@suse.de>, Cornelia Huck <cohuck@redhat.com>,
	mst@redhat.com, linux-rdma@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org
Subject: Re: [Qemu-devel] [RFC 0/3] VirtIO RDMA
Date: Tue, 30 Apr 2019 20:13:54 +0300	[thread overview]
Message-ID: <20190430171350.GA2763@lap1> (raw)
In-Reply-To: <20190422164527.GF21588@ziepe.ca>

On Mon, Apr 22, 2019 at 01:45:27PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> > On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > 
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > 
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > > 
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > > 
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >    that this is reasonable so one option is to have one for all and
> > > > >    multiplex the traffic on it. This is not good approach as by design it
> > > > >    introducing an optional starvation. Another approach would be multi
> > > > >    queues and round-robin (for example) between them.
> > > > > 
> > Typically there will be a one-to-one mapping between QPs and CPUs (on the
> > guest). 
> 
> Er we are really overloading words here.. The typical expectation is
> that a 'RDMA QP' will have thousands and thousands of instances on a
> system.
> 
> Most likely I think mapping 1:1 a virtio queue to a 'RDMA QP, CQ, SRQ,
> etc' is a bad idea...

We have three options, no virtqueue for QP, 1 to 1 or multiplexing. What
would be your vote on that?
I think you are for option #1, right? but in this case there is actually no
use of having a virtio-driver, isn't it?

> 
> > However, I'm still curious about the overall intent of this driver. Where
> > would the I/O be routed _to_ ?
> > It's nice that we have a virtualized driver, but this driver is
> > intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> > And this I/O needs to be send to (and possibly received from)
> > something.
> 
> As yet I have never heard of public RDMA HW that could be coupled to a
> virtio scheme. All HW defines their own queue ring buffer formats
> without standardization.

With virtio it is the time to have a standard, do you agree?

> 
> > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> > by using virtio-mdev?
> 
> Using PCI pass through means the guest has to have drivers for the
> device. A generic, perhaps slower, virtio path has some appeal in some
> cases.

>From experience we have with other emulated device the gap is getting lower
as the message size getting higher. So for example with message of size 2M
the emulated device gives close to line rate performances.

> 
> > If so, how would we route the I/O from one guest to the other?
> > Shared memory? Implementing a full-blown RDMA switch in qemu?
> 
> RoCE rides over the existing ethernet switching layer quemu plugs
> into
> 
> So if you built a shared memory, local host only, virtio-rdma then
> you'd probably run through the ethernet switch upon connection
> establishment to match the participating VMs.

Or you may use an enhanced rxe device, which bypass the Ethernet and
perform fast copy, as backend device for the virtio-rdma emulated device.

> 
> Jason

WARNING: multiple messages have this Message-ID (diff)
From: Yuval Shaia <yuval.shaia@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: mst@redhat.com, linux-rdma@vger.kernel.org,
	Cornelia Huck <cohuck@redhat.com>,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [Qemu-devel] [RFC 0/3] VirtIO RDMA
Date: Tue, 30 Apr 2019 20:13:54 +0300	[thread overview]
Message-ID: <20190430171350.GA2763@lap1> (raw)
Message-ID: <20190430171354.5lBxLH4QtVnYhSu0byFvkws7K4LC_cCTz976-8BjuNw@z> (raw)
In-Reply-To: <20190422164527.GF21588@ziepe.ca>

On Mon, Apr 22, 2019 at 01:45:27PM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2019 at 01:16:06PM +0200, Hannes Reinecke wrote:
> > On 4/15/19 12:35 PM, Yuval Shaia wrote:
> > > On Thu, Apr 11, 2019 at 07:02:15PM +0200, Cornelia Huck wrote:
> > > > On Thu, 11 Apr 2019 14:01:54 +0300
> > > > Yuval Shaia <yuval.shaia@oracle.com> wrote:
> > > > 
> > > > > Data center backends use more and more RDMA or RoCE devices and more and
> > > > > more software runs in virtualized environment.
> > > > > There is a need for a standard to enable RDMA/RoCE on Virtual Machines.
> > > > > 
> > > > > Virtio is the optimal solution since is the de-facto para-virtualizaton
> > > > > technology and also because the Virtio specification
> > > > > allows Hardware Vendors to support Virtio protocol natively in order to
> > > > > achieve bare metal performance.
> > > > > 
> > > > > This RFC is an effort to addresses challenges in defining the RDMA/RoCE
> > > > > Virtio Specification and a look forward on possible implementation
> > > > > techniques.
> > > > > 
> > > > > Open issues/Todo list:
> > > > > List is huge, this is only start point of the project.
> > > > > Anyway, here is one example of item in the list:
> > > > > - Multi VirtQ: Every QP has two rings and every CQ has one. This means that
> > > > >    in order to support for example 32K QPs we will need 64K VirtQ. Not sure
> > > > >    that this is reasonable so one option is to have one for all and
> > > > >    multiplex the traffic on it. This is not good approach as by design it
> > > > >    introducing an optional starvation. Another approach would be multi
> > > > >    queues and round-robin (for example) between them.
> > > > > 
> > Typically there will be a one-to-one mapping between QPs and CPUs (on the
> > guest). 
> 
> Er we are really overloading words here.. The typical expectation is
> that a 'RDMA QP' will have thousands and thousands of instances on a
> system.
> 
> Most likely I think mapping 1:1 a virtio queue to a 'RDMA QP, CQ, SRQ,
> etc' is a bad idea...

We have three options, no virtqueue for QP, 1 to 1 or multiplexing. What
would be your vote on that?
I think you are for option #1, right? but in this case there is actually no
use of having a virtio-driver, isn't it?

> 
> > However, I'm still curious about the overall intent of this driver. Where
> > would the I/O be routed _to_ ?
> > It's nice that we have a virtualized driver, but this driver is
> > intended to do I/O (even if it doesn't _do_ any I/O ATM :-)
> > And this I/O needs to be send to (and possibly received from)
> > something.
> 
> As yet I have never heard of public RDMA HW that could be coupled to a
> virtio scheme. All HW defines their own queue ring buffer formats
> without standardization.

With virtio it is the time to have a standard, do you agree?

> 
> > If so, wouldn't it be more efficient to use vfio, either by using SR-IOV or
> > by using virtio-mdev?
> 
> Using PCI pass through means the guest has to have drivers for the
> device. A generic, perhaps slower, virtio path has some appeal in some
> cases.

From experience we have with other emulated device the gap is getting lower
as the message size getting higher. So for example with message of size 2M
the emulated device gives close to line rate performances.

> 
> > If so, how would we route the I/O from one guest to the other?
> > Shared memory? Implementing a full-blown RDMA switch in qemu?
> 
> RoCE rides over the existing ethernet switching layer quemu plugs
> into
> 
> So if you built a shared memory, local host only, virtio-rdma then
> you'd probably run through the ethernet switch upon connection
> establishment to match the participating VMs.

Or you may use an enhanced rxe device, which bypass the Ethernet and
perform fast copy, as backend device for the virtio-rdma emulated device.

> 
> Jason


  parent reply	other threads:[~2019-04-30 17:14 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 11:01 [Qemu-devel] [RFC 0/3] VirtIO RDMA Yuval Shaia
2019-04-11 11:01 ` Yuval Shaia
2019-04-11 11:01 ` [Qemu-devel] [RFC 1/3] virtio-net: Move some virtio-net-pci decl to include/hw/virtio Yuval Shaia
2019-04-11 11:01   ` Yuval Shaia
2019-04-11 11:01 ` [Qemu-devel] [RFC 2/3] hw/virtio-rdma: VirtIO rdma device Yuval Shaia
2019-04-11 11:01   ` Yuval Shaia
2019-04-19 23:20   ` Michael S. Tsirkin
2019-04-19 23:20     ` Michael S. Tsirkin
2019-04-23  7:59     ` Cornelia Huck
2019-04-23  7:59       ` Cornelia Huck
2019-04-11 11:01 ` [Qemu-devel] [RFC 3/3] RDMA/virtio-rdma: VirtIO rdma driver Yuval Shaia
2019-04-11 11:01   ` Yuval Shaia
2019-04-13  7:58   ` Yanjun Zhu
2019-04-13  7:58     ` Yanjun Zhu
2019-04-14  5:20     ` Yuval Shaia
2019-04-14  5:20       ` Yuval Shaia
2019-04-16  1:07   ` Bart Van Assche
2019-04-16  1:07     ` Bart Van Assche
2019-04-16  8:56     ` Yuval Shaia
2019-04-16  8:56       ` Yuval Shaia
2019-04-11 17:02 ` [Qemu-devel] [RFC 0/3] VirtIO RDMA Cornelia Huck
2019-04-11 17:02   ` Cornelia Huck
2019-04-11 17:24   ` Jason Gunthorpe
2019-04-11 17:24     ` Jason Gunthorpe
2019-04-11 17:34     ` Yuval Shaia
2019-04-11 17:34       ` Yuval Shaia
2019-04-11 17:40       ` Jason Gunthorpe
2019-04-11 17:40         ` Jason Gunthorpe
2019-04-15 10:04         ` Yuval Shaia
2019-04-15 10:04           ` Yuval Shaia
2019-04-11 17:41       ` Yuval Shaia
2019-04-11 17:41         ` Yuval Shaia
2019-04-12  9:51         ` Devesh Sharma
2019-04-12  9:51           ` Devesh Sharma via Qemu-devel
2019-04-15 10:27           ` Yuval Shaia
2019-04-15 10:27             ` Yuval Shaia
2019-04-15 10:35   ` Yuval Shaia
2019-04-15 10:35     ` Yuval Shaia
2019-04-19 11:16     ` Hannes Reinecke
2019-04-19 11:16       ` Hannes Reinecke
2019-04-22  6:00       ` Leon Romanovsky
2019-04-22  6:00         ` Leon Romanovsky
2019-04-30 17:16         ` Yuval Shaia
2019-04-30 17:16           ` Yuval Shaia
2019-04-22 16:45       ` Jason Gunthorpe
2019-04-22 16:45         ` Jason Gunthorpe
2019-04-30 17:13         ` Yuval Shaia [this message]
2019-04-30 17:13           ` Yuval Shaia
2019-05-07 19:43           ` Jason Gunthorpe
2019-04-30 12:16       ` Yuval Shaia
2019-04-30 12:16         ` Yuval Shaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190430171350.GA2763@lap1 \
    --to=yuval.shaia@oracle.com \
    --cc=cohuck@redhat.com \
    --cc=hare@suse.de \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).