qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, imammedo@redhat.com,
	yuval.shaia@oracle.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation
Date: Tue, 19 Dec 2017 20:05:18 +0200	[thread overview]
Message-ID: <20171219194951-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20171217125457.3429-1-marcel@redhat.com>

On Sun, Dec 17, 2017 at 02:54:52PM +0200, Marcel Apfelbaum wrote:
> RFC -> V2:
>  - Full implementation of the pvrdma device
>  - Backend is an ibdevice interface, no need for the KDBR module
> 
> General description
> ===================
> PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
> It works with its Linux Kernel driver AS IS, no need for any special guest
> modifications.
> 
> While it complies with the VMware device, it can also communicate with bare
> metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
> can work with Soft-RoCE (rxe).
> 
> It does not require the whole guest RAM to be pinned

What happens if guest attempts to register all its memory?

> allowing memory
> over-commit
> and, even if not implemented yet, migration support will be
> possible with some HW assistance.

What does "HW assistance" mean here?
Can it work with any existing hardware?

> 
>  Design
>  ======
>  - Follows the behavior of VMware's pvrdma device, however is not tightly
>    coupled with it

Everything seems to be in pvrdma. Since it's not coupled, could you
split code to pvrdma specific and generic parts?

> and most of the code can be reused if we decide to
>    continue to a Virtio based RDMA device.

I suspect that without virtio we won't be able to do any future
extensions.

>  - It exposes 3 BARs:
>     BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
>             completions
>     BAR 1 - Configuration of registers

What does this mean?

>     BAR 2 - UAR, used to pass HW commands from driver.

A detailed description of above belongs in documentation.

>  - The device performs internal management of the RDMA
>    resources (PDs, CQs, QPs, ...), meaning the objects
>    are not directly coupled to a physical RDMA device resources.

I am wondering how do you make connections? QP#s are exposed on
the wire during connection management.

> The pvrdma backend is an ibdevice interface that can be exposed
> either by a Soft-RoCE(rxe) device on machines with no RDMA device,
> or an HCA SRIOV function(VF/PF).
> Note that ibdevice interfaces can't be shared between pvrdma devices,
> each one requiring a separate instance (rxe or SRIOV VF).

So what's the advantage of this over pass-through then?


> 
> Tests and performance
> =====================
> Tested with SoftRoCE backend (rxe)/Mellanox ConnectX3,
> and Mellanox ConnectX4 HCAs with:
>   - VMs in the same host
>   - VMs in different hosts 
>   - VMs to bare metal.
> 
> The best performance achieved with ConnectX HCAs and buffer size
> bigger than 1MB which was the line rate ~ 50Gb/s.
> The conclusion is that using the PVRDMA device there are no
> actual performance penalties compared to bare metal for big enough
> buffers (which is quite common when using RDMA), while allowing
> memory overcommit.
> 
> Marcel Apfelbaum (3):
>   mem: add share parameter to memory-backend-ram
>   docs: add pvrdma device documentation.
>   MAINTAINERS: add entry for hw/net/pvrdma
> 
> Yuval Shaia (2):
>   pci/shpc: Move function to generic header file
>   pvrdma: initial implementation
> 
>  MAINTAINERS                         |   7 +
>  Makefile.objs                       |   1 +
>  backends/hostmem-file.c             |  25 +-
>  backends/hostmem-ram.c              |   4 +-
>  backends/hostmem.c                  |  21 +
>  configure                           |   9 +-
>  default-configs/arm-softmmu.mak     |   2 +
>  default-configs/i386-softmmu.mak    |   1 +
>  default-configs/x86_64-softmmu.mak  |   1 +
>  docs/pvrdma.txt                     | 145 ++++++
>  exec.c                              |  26 +-
>  hw/net/Makefile.objs                |   7 +
>  hw/net/pvrdma/pvrdma.h              | 179 +++++++
>  hw/net/pvrdma/pvrdma_backend.c      | 986 ++++++++++++++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_backend.h      |  74 +++
>  hw/net/pvrdma/pvrdma_backend_defs.h |  68 +++
>  hw/net/pvrdma/pvrdma_cmd.c          | 338 ++++++++++++
>  hw/net/pvrdma/pvrdma_defs.h         | 121 +++++
>  hw/net/pvrdma/pvrdma_dev_api.h      | 580 +++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_dev_ring.c     | 138 +++++
>  hw/net/pvrdma/pvrdma_dev_ring.h     |  42 ++
>  hw/net/pvrdma/pvrdma_ib_verbs.h     | 399 +++++++++++++++
>  hw/net/pvrdma/pvrdma_main.c         | 664 ++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_qp_ops.c       | 187 +++++++
>  hw/net/pvrdma/pvrdma_qp_ops.h       |  26 +
>  hw/net/pvrdma/pvrdma_ring.h         | 134 +++++
>  hw/net/pvrdma/pvrdma_rm.c           | 791 +++++++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_rm.h           |  54 ++
>  hw/net/pvrdma/pvrdma_rm_defs.h      | 111 ++++
>  hw/net/pvrdma/pvrdma_types.h        |  37 ++
>  hw/net/pvrdma/pvrdma_utils.c        | 133 +++++
>  hw/net/pvrdma/pvrdma_utils.h        |  41 ++
>  hw/net/pvrdma/trace-events          |   9 +
>  hw/pci/shpc.c                       |  11 +-
>  include/exec/memory.h               |  23 +
>  include/exec/ram_addr.h             |   3 +-
>  include/hw/pci/pci_ids.h            |   3 +
>  include/qemu/cutils.h               |  10 +
>  include/qemu/osdep.h                |   2 +-
>  include/sysemu/hostmem.h            |   2 +-
>  include/sysemu/kvm.h                |   2 +-
>  memory.c                            |  16 +-
>  util/oslib-posix.c                  |   4 +-
>  util/oslib-win32.c                  |   2 +-
>  44 files changed, 5378 insertions(+), 61 deletions(-)
>  create mode 100644 docs/pvrdma.txt
>  create mode 100644 hw/net/pvrdma/pvrdma.h
>  create mode 100644 hw/net/pvrdma/pvrdma_backend.c
>  create mode 100644 hw/net/pvrdma/pvrdma_backend.h
>  create mode 100644 hw/net/pvrdma/pvrdma_backend_defs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_cmd.c
>  create mode 100644 hw/net/pvrdma/pvrdma_defs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_dev_api.h
>  create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.c
>  create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.h
>  create mode 100644 hw/net/pvrdma/pvrdma_ib_verbs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_main.c
>  create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.c
>  create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.h
>  create mode 100644 hw/net/pvrdma/pvrdma_ring.h
>  create mode 100644 hw/net/pvrdma/pvrdma_rm.c
>  create mode 100644 hw/net/pvrdma/pvrdma_rm.h
>  create mode 100644 hw/net/pvrdma/pvrdma_rm_defs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_types.h
>  create mode 100644 hw/net/pvrdma/pvrdma_utils.c
>  create mode 100644 hw/net/pvrdma/pvrdma_utils.h
>  create mode 100644 hw/net/pvrdma/trace-events
> 
> -- 
> 2.13.5

  parent reply	other threads:[~2017-12-19 18:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-17 12:54 [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 1/5] pci/shpc: Move function to generic header file Marcel Apfelbaum
2017-12-17 18:16   ` Philippe Mathieu-Daudé
2017-12-17 19:03     ` Yuval Shaia
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 2/5] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 3/5] docs: add pvrdma device documentation Marcel Apfelbaum
2017-12-19 17:47   ` Michael S. Tsirkin
2017-12-20 14:45     ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 4/5] pvrdma: initial implementation Marcel Apfelbaum
2017-12-19 16:12   ` Michael S. Tsirkin
2017-12-19 17:29     ` Marcel Apfelbaum
2017-12-19 17:48   ` Michael S. Tsirkin
2017-12-20 15:25     ` Yuval Shaia
2017-12-20 18:01       ` Michael S. Tsirkin
2017-12-19 19:13   ` Philippe Mathieu-Daudé
2017-12-20  4:08     ` Michael S. Tsirkin
2017-12-20 14:46       ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 5/5] MAINTAINERS: add entry for hw/net/pvrdma Marcel Apfelbaum
2017-12-19 17:49   ` Michael S. Tsirkin
2017-12-19 18:05 ` Michael S. Tsirkin [this message]
2017-12-20 15:07   ` [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2017-12-21  0:05     ` Michael S. Tsirkin
2017-12-21  7:27       ` Yuval Shaia
2017-12-21 14:22         ` Michael S. Tsirkin
2017-12-21 15:59           ` Marcel Apfelbaum
2017-12-21 20:46             ` Michael S. Tsirkin
2017-12-21 22:30               ` Yuval Shaia
2017-12-22  4:58                 ` Marcel Apfelbaum
2017-12-20 17:56   ` Yuval Shaia
2017-12-20 18:05     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171219194951-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=marcel@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).