qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, imammedo@redhat.com,
	yuval.shaia@oracle.com, pbonzini@redhat.com, mst@redhat.com,
	borntraeger@de.ibm.com
Subject: Re: [Qemu-devel] [PATCH V6 4/5] pvrdma: initial implementation
Date: Tue, 9 Jan 2018 11:39:11 +0100	[thread overview]
Message-ID: <20180109113911.1746995b.cohuck@redhat.com> (raw)
In-Reply-To: <20180107123224.100877-5-marcel@redhat.com>

On Sun,  7 Jan 2018 14:32:23 +0200
Marcel Apfelbaum <marcel@redhat.com> wrote:

> From: Yuval Shaia <yuval.shaia@oracle.com>
> 
> PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
> It works with its Linux Kernel driver AS IS, no need for any special guest
> modifications.
> 
> While it complies with the VMware device, it can also communicate with bare
> metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
> can work with Soft-RoCE (rxe).
> 
> It does not require the whole guest RAM to be pinned allowing memory
> over-commit and, even if not implemented yet, migration support will be
> possible with some HW assistance.
> 
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
> ---
>  Makefile.objs                      |   2 +
>  configure                          |   9 +-
>  default-configs/arm-softmmu.mak    |   1 +
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  hw/Makefile.objs                   |   1 +
>  hw/rdma/Makefile.objs              |   6 +
>  hw/rdma/rdma_backend.c             | 815 +++++++++++++++++++++++++++++++++++++
>  hw/rdma/rdma_backend.h             |  92 +++++
>  hw/rdma/rdma_backend_defs.h        |  62 +++
>  hw/rdma/rdma_rm.c                  | 619 ++++++++++++++++++++++++++++
>  hw/rdma/rdma_rm.h                  |  69 ++++
>  hw/rdma/rdma_rm_defs.h             | 106 +++++
>  hw/rdma/rdma_utils.c               |  52 +++
>  hw/rdma/rdma_utils.h               |  43 ++
>  hw/rdma/trace-events               |   5 +
>  hw/rdma/vmw/pvrdma.h               | 122 ++++++
>  hw/rdma/vmw/pvrdma_cmd.c           | 679 ++++++++++++++++++++++++++++++
>  hw/rdma/vmw/pvrdma_dev_api.h       | 602 +++++++++++++++++++++++++++
>  hw/rdma/vmw/pvrdma_dev_ring.c      | 139 +++++++
>  hw/rdma/vmw/pvrdma_dev_ring.h      |  42 ++
>  hw/rdma/vmw/pvrdma_ib_verbs.h      | 433 ++++++++++++++++++++
>  hw/rdma/vmw/pvrdma_main.c          | 644 +++++++++++++++++++++++++++++
>  hw/rdma/vmw/pvrdma_qp_ops.c        | 212 ++++++++++
>  hw/rdma/vmw/pvrdma_qp_ops.h        |  27 ++
>  hw/rdma/vmw/pvrdma_ring.h          | 134 ++++++
>  hw/rdma/vmw/trace-events           |   5 +
>  hw/rdma/vmw/vmw_pvrdma-abi.h       | 311 ++++++++++++++
>  include/hw/pci/pci_ids.h           |   3 +
>  29 files changed, 5233 insertions(+), 4 deletions(-)
>  create mode 100644 hw/rdma/Makefile.objs
>  create mode 100644 hw/rdma/rdma_backend.c
>  create mode 100644 hw/rdma/rdma_backend.h
>  create mode 100644 hw/rdma/rdma_backend_defs.h
>  create mode 100644 hw/rdma/rdma_rm.c
>  create mode 100644 hw/rdma/rdma_rm.h
>  create mode 100644 hw/rdma/rdma_rm_defs.h
>  create mode 100644 hw/rdma/rdma_utils.c
>  create mode 100644 hw/rdma/rdma_utils.h
>  create mode 100644 hw/rdma/trace-events
>  create mode 100644 hw/rdma/vmw/pvrdma.h
>  create mode 100644 hw/rdma/vmw/pvrdma_cmd.c
>  create mode 100644 hw/rdma/vmw/pvrdma_dev_api.h
>  create mode 100644 hw/rdma/vmw/pvrdma_dev_ring.c
>  create mode 100644 hw/rdma/vmw/pvrdma_dev_ring.h
>  create mode 100644 hw/rdma/vmw/pvrdma_ib_verbs.h
>  create mode 100644 hw/rdma/vmw/pvrdma_main.c
>  create mode 100644 hw/rdma/vmw/pvrdma_qp_ops.c
>  create mode 100644 hw/rdma/vmw/pvrdma_qp_ops.h
>  create mode 100644 hw/rdma/vmw/pvrdma_ring.h
>  create mode 100644 hw/rdma/vmw/trace-events
>  create mode 100644 hw/rdma/vmw/vmw_pvrdma-abi.h

(...)

> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index b0d6e65038..0e7a3c1700 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -132,3 +132,4 @@ CONFIG_GPIO_KEY=y
>  CONFIG_MSF2=y
>  CONFIG_FW_CFG_DMA=y
>  CONFIG_XILINX_AXI=y
> +CONFIG_PVRDMA=y
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index 95ac4b464a..88298e4ef5 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -61,3 +61,4 @@ CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
>  CONFIG_PXB=y
>  CONFIG_ACPI_VMGENID=y
>  CONFIG_FW_CFG_DMA=y
> +CONFIG_PVRDMA=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 0221236825..f571da36eb 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -61,3 +61,4 @@ CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
>  CONFIG_PXB=y
>  CONFIG_ACPI_VMGENID=y
>  CONFIG_FW_CFG_DMA=y
> +CONFIG_PVRDMA=y

Any reason you did not add this to other architectures?

I added "CONFIG_PVRDMA=$(CONFIG_PCI)" to s390x-softmmu.mak, and it at
least builds (did not try to actually get it to work, although I don't
see any immediate blocker for that).

(...)

> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> new file mode 100644
> index 0000000000..dcb799f49b
> --- /dev/null
> +++ b/hw/rdma/rdma_backend.c

(...)

> +static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq,
> +                    bool one_poll)
> +{
> +    int i, ne;
> +    BackendCtx *bctx;
> +    struct ibv_wc wc[2];
> +
> +    pr_dbg("Entering poll_cq loop on cq %p\n", ibcq);
> +    do {
> +        ne = ibv_poll_cq(ibcq, 2, wc);
> +        if (ne == 0 && one_poll) {
> +            pr_dbg("CQ is empty\n");
> +            return;
> +        }
> +    } while (ne < 0);
> +
> +    pr_dbg("Got %d completion(s) from cq %p\n", ne, ibcq);
> +
> +    for (i = 0; i < ne; i++) {
> +        pr_dbg("wr_id=0x%lx\n", wc[i].wr_id);
> +        pr_dbg("status=%d\n", wc[i].status);
> +
> +        bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, wc[i].wr_id);
> +        if (unlikely(!bctx)) {
> +            pr_dbg("Error: Fail to find ctx for req %ld\n", wc[i].wr_id);

s/Fail/Failed/

(A lot of these through out the various files. Just thought I'd point
that out; but I don't really have time to do a real review.)

> +            continue;
> +        }
> +        pr_dbg("Processing %s CQE\n", bctx->is_tx_req ? "send" : "recv");
> +
> +        comp_handler(wc[i].status, wc[i].vendor_err, bctx->up_ctx);
> +
> +        rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
> +        free(bctx);
> +    }
> +}

(...)

> diff --git a/hw/rdma/vmw/pvrdma_dev_api.h b/hw/rdma/vmw/pvrdma_dev_api.h
> new file mode 100644
> index 0000000000..bf1986a976
> --- /dev/null
> +++ b/hw/rdma/vmw/pvrdma_dev_api.h
> @@ -0,0 +1,602 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA device definitions
> + *
> + * Copyright (C) 2018 Oracle
> + * Copyright (C) 2018 Red Hat Inc
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@oracle.com>
> + *     Marcel Apfelbaum <marcel@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_DEV_API_H
> +#define PVRDMA_DEV_API_H
> +
> +/*
> + * Following is an interface definition for PVRDMA device as provided by
> + * VMWARE.
> + * See original copyright from Linux kernel v4.14.5 header file
> + * drivers/infiniband/hw/vmw_pvrdma/pvrdma_dev_api.h

Could that file be exported as UAPI in the kernel and added to the
linux-headers script?

(...)

> diff --git a/hw/rdma/vmw/pvrdma_ib_verbs.h b/hw/rdma/vmw/pvrdma_ib_verbs.h
> new file mode 100644
> index 0000000000..cf1430024b
> --- /dev/null
> +++ b/hw/rdma/vmw/pvrdma_ib_verbs.h
> @@ -0,0 +1,433 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA device definitions
> + *
> + * Copyright (C) 2018 Oracle
> + * Copyright (C) 2018 Red Hat Inc
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@oracle.com>
> + *     Marcel Apfelbaum <marcel@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_IB_VERBS_H
> +#define PVRDMA_IB_VERBS_H
> +
> +/*
> + * VMWARE headers we got from Linux kernel do not fully comply QEMU coding
> + * standards in sense of types and defines used.
> + * Since we didn't want to change VMWARE code, following set of typedefs
> + * and defines needed to compile these headers with QEMU introduced.
> + */
> +
> +#define u8     uint8_t
> +#define u16    unsigned short
> +#define u32    uint32_t
> +#define u64    uint64_t

I think the headers update already takes care of some conversions.
Otherwise, same comment as for the header above.

> +
> +/*
> + * Following is an interface definition for PVRDMA device as provided by
> + * VMWARE.
> + * See original copyright from Linux kernel v4.14.5 header file
> + * drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
> + */

(...)

> diff --git a/hw/rdma/vmw/vmw_pvrdma-abi.h b/hw/rdma/vmw/vmw_pvrdma-abi.h
> new file mode 100644
> index 0000000000..8cfb9d7745
> --- /dev/null
> +++ b/hw/rdma/vmw/vmw_pvrdma-abi.h
> @@ -0,0 +1,311 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA device definitions
> + *
> + * Copyright (C) 2018 Oracle
> + * Copyright (C) 2018 Red Hat Inc
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@oracle.com>
> + *     Marcel Apfelbaum <marcel@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef VMW_PVRDMA_ABI_H
> +#define VMW_PVRDMA_ABI_H
> +
> +/*
> + * Following is an interface definition for PVRDMA device as provided by
> + * VMWARE.
> + * See original copyright from Linux kernel v4.14.5 header file
> + * include/uapi/rdma/vmw_pvrdma-abi.h
> + */

This one is already exported.

  reply	other threads:[~2018-01-09 10:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-07 12:32 [Qemu-devel] [PATCH V6 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2018-01-07 12:32 ` [Qemu-devel] [PATCH V6 1/5] pci/shpc: Move function to generic header file Marcel Apfelbaum
2018-01-07 13:47   ` Philippe Mathieu-Daudé
2018-01-07 14:35     ` Marcel Apfelbaum
2018-01-07 12:32 ` [Qemu-devel] [PATCH V6 2/5] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2018-01-08 16:05   ` [Qemu-devel] Getting rid of phys_mem_set_alloc (was: Re: [PATCH V6 2/5] mem: add share parameter to memory-backend-ram) Cornelia Huck
2018-01-08 18:53     ` [Qemu-devel] Getting rid of phys_mem_set_alloc Marcel Apfelbaum
2018-01-07 12:32 ` [Qemu-devel] [PATCH V6 3/5] docs: add pvrdma device documentation Marcel Apfelbaum
2018-01-09  9:17   ` Cornelia Huck
2018-01-09 10:09     ` Marcel Apfelbaum
2018-01-07 12:32 ` [Qemu-devel] [PATCH V6 4/5] pvrdma: initial implementation Marcel Apfelbaum
2018-01-09 10:39   ` Cornelia Huck [this message]
2018-01-09 11:08     ` Yuval Shaia
2018-01-09 12:51       ` Cornelia Huck
2018-01-10  9:28         ` Marcel Apfelbaum
2018-01-10  9:37           ` Cornelia Huck
2018-01-10  9:06     ` Marcel Apfelbaum
2018-01-07 12:32 ` [Qemu-devel] [PATCH V6 5/5] MAINTAINERS: add entry for hw/rdma Marcel Apfelbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180109113911.1746995b.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).