From: "Michael S. Tsirkin" <mst@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, imammedo@redhat.com,
yuval.shaia@oracle.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation
Date: Tue, 19 Dec 2017 20:05:18 +0200 [thread overview]
Message-ID: <20171219194951-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20171217125457.3429-1-marcel@redhat.com>
On Sun, Dec 17, 2017 at 02:54:52PM +0200, Marcel Apfelbaum wrote:
> RFC -> V2:
> - Full implementation of the pvrdma device
> - Backend is an ibdevice interface, no need for the KDBR module
>
> General description
> ===================
> PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
> It works with its Linux Kernel driver AS IS, no need for any special guest
> modifications.
>
> While it complies with the VMware device, it can also communicate with bare
> metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
> can work with Soft-RoCE (rxe).
>
> It does not require the whole guest RAM to be pinned
What happens if guest attempts to register all its memory?
> allowing memory
> over-commit
> and, even if not implemented yet, migration support will be
> possible with some HW assistance.
What does "HW assistance" mean here?
Can it work with any existing hardware?
>
> Design
> ======
> - Follows the behavior of VMware's pvrdma device, however is not tightly
> coupled with it
Everything seems to be in pvrdma. Since it's not coupled, could you
split code to pvrdma specific and generic parts?
> and most of the code can be reused if we decide to
> continue to a Virtio based RDMA device.
I suspect that without virtio we won't be able to do any future
extensions.
> - It exposes 3 BARs:
> BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
> completions
> BAR 1 - Configuration of registers
What does this mean?
> BAR 2 - UAR, used to pass HW commands from driver.
A detailed description of above belongs in documentation.
> - The device performs internal management of the RDMA
> resources (PDs, CQs, QPs, ...), meaning the objects
> are not directly coupled to a physical RDMA device resources.
I am wondering how do you make connections? QP#s are exposed on
the wire during connection management.
> The pvrdma backend is an ibdevice interface that can be exposed
> either by a Soft-RoCE(rxe) device on machines with no RDMA device,
> or an HCA SRIOV function(VF/PF).
> Note that ibdevice interfaces can't be shared between pvrdma devices,
> each one requiring a separate instance (rxe or SRIOV VF).
So what's the advantage of this over pass-through then?
>
> Tests and performance
> =====================
> Tested with SoftRoCE backend (rxe)/Mellanox ConnectX3,
> and Mellanox ConnectX4 HCAs with:
> - VMs in the same host
> - VMs in different hosts
> - VMs to bare metal.
>
> The best performance achieved with ConnectX HCAs and buffer size
> bigger than 1MB which was the line rate ~ 50Gb/s.
> The conclusion is that using the PVRDMA device there are no
> actual performance penalties compared to bare metal for big enough
> buffers (which is quite common when using RDMA), while allowing
> memory overcommit.
>
> Marcel Apfelbaum (3):
> mem: add share parameter to memory-backend-ram
> docs: add pvrdma device documentation.
> MAINTAINERS: add entry for hw/net/pvrdma
>
> Yuval Shaia (2):
> pci/shpc: Move function to generic header file
> pvrdma: initial implementation
>
> MAINTAINERS | 7 +
> Makefile.objs | 1 +
> backends/hostmem-file.c | 25 +-
> backends/hostmem-ram.c | 4 +-
> backends/hostmem.c | 21 +
> configure | 9 +-
> default-configs/arm-softmmu.mak | 2 +
> default-configs/i386-softmmu.mak | 1 +
> default-configs/x86_64-softmmu.mak | 1 +
> docs/pvrdma.txt | 145 ++++++
> exec.c | 26 +-
> hw/net/Makefile.objs | 7 +
> hw/net/pvrdma/pvrdma.h | 179 +++++++
> hw/net/pvrdma/pvrdma_backend.c | 986 ++++++++++++++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_backend.h | 74 +++
> hw/net/pvrdma/pvrdma_backend_defs.h | 68 +++
> hw/net/pvrdma/pvrdma_cmd.c | 338 ++++++++++++
> hw/net/pvrdma/pvrdma_defs.h | 121 +++++
> hw/net/pvrdma/pvrdma_dev_api.h | 580 +++++++++++++++++++++
> hw/net/pvrdma/pvrdma_dev_ring.c | 138 +++++
> hw/net/pvrdma/pvrdma_dev_ring.h | 42 ++
> hw/net/pvrdma/pvrdma_ib_verbs.h | 399 +++++++++++++++
> hw/net/pvrdma/pvrdma_main.c | 664 ++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_qp_ops.c | 187 +++++++
> hw/net/pvrdma/pvrdma_qp_ops.h | 26 +
> hw/net/pvrdma/pvrdma_ring.h | 134 +++++
> hw/net/pvrdma/pvrdma_rm.c | 791 +++++++++++++++++++++++++++++
> hw/net/pvrdma/pvrdma_rm.h | 54 ++
> hw/net/pvrdma/pvrdma_rm_defs.h | 111 ++++
> hw/net/pvrdma/pvrdma_types.h | 37 ++
> hw/net/pvrdma/pvrdma_utils.c | 133 +++++
> hw/net/pvrdma/pvrdma_utils.h | 41 ++
> hw/net/pvrdma/trace-events | 9 +
> hw/pci/shpc.c | 11 +-
> include/exec/memory.h | 23 +
> include/exec/ram_addr.h | 3 +-
> include/hw/pci/pci_ids.h | 3 +
> include/qemu/cutils.h | 10 +
> include/qemu/osdep.h | 2 +-
> include/sysemu/hostmem.h | 2 +-
> include/sysemu/kvm.h | 2 +-
> memory.c | 16 +-
> util/oslib-posix.c | 4 +-
> util/oslib-win32.c | 2 +-
> 44 files changed, 5378 insertions(+), 61 deletions(-)
> create mode 100644 docs/pvrdma.txt
> create mode 100644 hw/net/pvrdma/pvrdma.h
> create mode 100644 hw/net/pvrdma/pvrdma_backend.c
> create mode 100644 hw/net/pvrdma/pvrdma_backend.h
> create mode 100644 hw/net/pvrdma/pvrdma_backend_defs.h
> create mode 100644 hw/net/pvrdma/pvrdma_cmd.c
> create mode 100644 hw/net/pvrdma/pvrdma_defs.h
> create mode 100644 hw/net/pvrdma/pvrdma_dev_api.h
> create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.c
> create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.h
> create mode 100644 hw/net/pvrdma/pvrdma_ib_verbs.h
> create mode 100644 hw/net/pvrdma/pvrdma_main.c
> create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.c
> create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.h
> create mode 100644 hw/net/pvrdma/pvrdma_ring.h
> create mode 100644 hw/net/pvrdma/pvrdma_rm.c
> create mode 100644 hw/net/pvrdma/pvrdma_rm.h
> create mode 100644 hw/net/pvrdma/pvrdma_rm_defs.h
> create mode 100644 hw/net/pvrdma/pvrdma_types.h
> create mode 100644 hw/net/pvrdma/pvrdma_utils.c
> create mode 100644 hw/net/pvrdma/pvrdma_utils.h
> create mode 100644 hw/net/pvrdma/trace-events
>
> --
> 2.13.5
next prev parent reply other threads:[~2017-12-19 18:05 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-17 12:54 [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 1/5] pci/shpc: Move function to generic header file Marcel Apfelbaum
2017-12-17 18:16 ` Philippe Mathieu-Daudé
2017-12-17 19:03 ` Yuval Shaia
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 2/5] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 3/5] docs: add pvrdma device documentation Marcel Apfelbaum
2017-12-19 17:47 ` Michael S. Tsirkin
2017-12-20 14:45 ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 4/5] pvrdma: initial implementation Marcel Apfelbaum
2017-12-19 16:12 ` Michael S. Tsirkin
2017-12-19 17:29 ` Marcel Apfelbaum
2017-12-19 17:48 ` Michael S. Tsirkin
2017-12-20 15:25 ` Yuval Shaia
2017-12-20 18:01 ` Michael S. Tsirkin
2017-12-19 19:13 ` Philippe Mathieu-Daudé
2017-12-20 4:08 ` Michael S. Tsirkin
2017-12-20 14:46 ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 5/5] MAINTAINERS: add entry for hw/net/pvrdma Marcel Apfelbaum
2017-12-19 17:49 ` Michael S. Tsirkin
2017-12-19 18:05 ` Michael S. Tsirkin [this message]
2017-12-20 15:07 ` [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2017-12-21 0:05 ` Michael S. Tsirkin
2017-12-21 7:27 ` Yuval Shaia
2017-12-21 14:22 ` Michael S. Tsirkin
2017-12-21 15:59 ` Marcel Apfelbaum
2017-12-21 20:46 ` Michael S. Tsirkin
2017-12-21 22:30 ` Yuval Shaia
2017-12-22 4:58 ` Marcel Apfelbaum
2017-12-20 17:56 ` Yuval Shaia
2017-12-20 18:05 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171219194951-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=ehabkost@redhat.com \
--cc=imammedo@redhat.com \
--cc=marcel@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuval.shaia@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.