From: Jason Gunthorpe <jgg@nvidia.com>
To: kaike.wan@intel.com
Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, todd.rimmer@intel.com
Subject: Re: [PATCH RFC 1/9] RDMA/rv: Public interferce for the RDMA Rendezvous module
Date: Fri, 19 Mar 2021 13:00:24 -0300 [thread overview]
Message-ID: <20210319160024.GW2356281@nvidia.com> (raw)
In-Reply-To: <20210319125635.34492-2-kaike.wan@intel.com>
On Fri, Mar 19, 2021 at 08:56:27AM -0400, kaike.wan@intel.com wrote:
> From: Kaike Wan <kaike.wan@intel.com>
>
> The RDMA Rendezvous (rv) module provides an interface for HPC
> middlewares to improve performance by caching memory region
> registration, and improve the scalibity of RDMA transaction
> through connection managements between nodes. This mechanism
> is implemented through the following ioctl requests:
> - ATTACH: to attach to an RDMA device.
> - REG_MEM: to register a user/kernel memory region.
> - DEREG_MEM: to release application use of MR, allowing it to
> remain in cache.
> - GET_CACHE_STATS: to get cache statistics.
> - CONN_CREATE: to create an RC connection.
> - CONN_CONNECT: to start the connection.
> - CONN_GET_CONN_COUNT: to use as part of error recovery from lost
> messages in the application.
> - CONN_GET_STATS: to get connection statistics.
> - GET_EVENT_STATS: to get the RDMA event statistics.
> - POST_RDMA_WR_IMMED: to post an RDMA WRITE WITH IMMED request.
>
> Signed-off-by: Todd Rimmer <todd.rimmer@intel.com>
> Signed-off-by: Kaike Wan <kaike.wan@intel.com>
> include/uapi/rdma/rv_user_ioctls.h | 558 +++++++++++++++++++++++++++++
> 1 file changed, 558 insertions(+)
> create mode 100644 include/uapi/rdma/rv_user_ioctls.h
>
> diff --git a/include/uapi/rdma/rv_user_ioctls.h b/include/uapi/rdma/rv_user_ioctls.h
> new file mode 100644
> index 000000000000..97e35b722443
> +++ b/include/uapi/rdma/rv_user_ioctls.h
> @@ -0,0 +1,558 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
> +/*
> + * Copyright(c) 2020 - 2021 Intel Corporation.
> + */
> +#ifndef __RV_USER_IOCTL_H__
> +#define __RV_USER_IOCTL_H__
> +#include <rdma/rdma_user_ioctl.h>
> +#include <rdma/ib_user_sa.h>
> +#include <rdma/ib_user_verbs.h>
> +
> +/* Checking /Documentation/userspace-api/ioctl/ioctl-number.rst */
> +#define RV_MAGIC RDMA_IOCTL_MAGIC
No.
> +#define RV_FILE_NAME "/dev/rv"
No.
> +
> +/*
> + * Handles are opaque to application; they are meaningful only to the
> + * RV driver
> + */
> +
> +/* this version of ABI */
> +#define RV_ABI_VER_MAJOR 1
> +#define RV_ABI_VER_MINOR 0
No, see my remarks to your other intel colleagues doing the ioasid
stuff.
> +struct rv_query_params_out {
> + /* ABI version */
> + __u16 major_rev;
> + __u16 minor_rev;
> + __u32 resv1;
> + __aligned_u64 capability;
> + __aligned_u64 resv2[6];
No pre-adding reserved stuff
> +};
> +
> +#define RV_IOCTL_QUERY _IOR(RV_MAGIC, 0xFC, struct rv_query_params_out)
> +
> +/* Mode for use of rv module by PSM */
> +#define RV_RDMA_MODE_USER 0 /* user MR caching only */
> +#define RV_RDMA_MODE_KERNEL 1 /* + kernel RC QPs with kernel MR caching */
Huh? Mode sounds bad.
> +/*
> + * mr_cache_size is in MBs and if 0 will use module param as default
> + * num_conn - number of QPs between each pair of nodes
> + * loc_addr - used to select client/listen vs rem_addr
> + * index_bits - num high bits of immed data with rv index
> + * loc_gid_index - SGID for client connections
> + * loc_gid[16] - to double check gid_index unchanged
> + * job_key[RV_MAX_JOB_KEY_LEN] = unique uuid per job
> + * job_key_len - len, if 0 matches jobs with len==0 only
> + * q_depth - size of QP and per QP CQs
> + * reconnect_timeout - in seconds from loss to restoration
> + * hb_interval - in milliseconds between heartbeats
> + */
> +struct rv_attach_params_in {
> + char dev_name[RV_MAX_DEV_NAME_LEN];
Guessing this is a no too.
> + __u32 mr_cache_size;
> + __u8 rdma_mode;
> +
> + /* additional information for RV_RDMA_MODE_KERNEL */
> + __u8 port_num;
> + __u8 num_conn;
Lots of alignment holes, don't do that either.
> +struct rv_attach_params {
> + union {
> + struct rv_attach_params_in in;
> + struct rv_attach_params_out out;
> + };
> +};
Yikes, no
> +/* The buffer is used to register a kernel mr */
> +#define IBV_ACCESS_KERNEL 0x80000000
Huh? WTF on on many levels
> +/*
> + * ibv_pd_handle - user space appl allocated pd
> + * ulen - driver_udata inlen
> + * *udata - driver_updata inbuf
> + */
> +struct rv_mem_params_in {
> + __u32 ibv_pd_handle;
> + __u32 cmd_fd_int;
Really? You want to reach into the command FD from a ULP and extract
objects? Double yikes. Did you do this properly, taking care of every
lifetime rule and handling disassociation?
> + __aligned_u64 addr;
> + __aligned_u64 length;
> + __u32 access;
> + size_t ulen;
> + void *udata;
'void *' is wrong for ioctls.
> +struct rv_conn_get_stats_params_out {
Too many stats, don't you think? Most of the header seems to be stats
of one thing or another.
> +/*
> + * events placed on ring buffer for delivery to user space.
> + * Carefully sized to be a multiple of 64 bytes for cache alignment.
> + * Must pack to get good field alignment and desired 64B overall size
> + * Unlike verbs, all rv_event fields are defined even when
> + * rv_event.wc.status != IB_WC_SUCCESS. Only sent writes can report bad status.
> + * event_type - enum rv_event_type
> + * wc - send or recv work completions
> + * status - ib_wc_status
> + * resv1 - alignment
> + * imm_data - for RV_WC_RECV_RDMA_WITH_IMM only
> + * wr_id - PSM wr_id for RV_WC_RDMA_WRITE only
> + * conn_handle - conn handle. For efficiency in completion processing, this
> + * handle is the rv_conn handle, not the rv_user_conn.
> + * Main use is sanity checks. On Recv PSM must use imm_data to
> + * efficiently identify source.
> + * byte_len - unlike verbs API, this is always valid
> + * resv2 - alignment
> + * cache_align - not used, but forces overall struct to 64B size
> + */
> +struct rv_event {
> + __u8 event_type;
> + union {
> + struct {
> + __u8 status;
> + __u16 resv1;
> + __u32 imm_data;
> + __aligned_u64 wr_id;
> + __aligned_u64 conn_handle;
> + __u32 byte_len;
> + __u32 resv2;
> + } __attribute__((__packed__)) wc;
> + struct {
> + __u8 pad[7];
> + uint64_t pad2[7];
> + } __attribute__((__packed__)) cache_align;
> + };
> +} __attribute__((__packed__));
Uhh, mixing packed and aligned_u64 is pretty silly. I don't think this
needs to be packed or written in this tortured way.
> +
> +/*
> + * head - consumer removes here
> + * tail - producer adds here
> + * overflow_cnt - number of times producer overflowed ring and discarded
> + * pad - 64B cache alignment for entries
> + */
> +struct rv_ring_header {
> + volatile __u32 head;
> + volatile __u32 tail;
> + volatile __u64 overflow_cnt;
No on volatile, and missed a __aligned here
This uapi needs to be much better. It looks like the mess from the PSM
char dev just re-imported here.
At the very least split the caching from the other operations and
follow normal ioctl design
And you need to rethink the uverbs stuff.
Jason
next prev parent reply other threads:[~2021-03-19 16:01 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-19 12:56 [PATCH RFC 0/9] A rendezvous module kaike.wan
2021-03-19 12:56 ` [PATCH RFC 1/9] RDMA/rv: Public interferce for the RDMA Rendezvous module kaike.wan
2021-03-19 16:00 ` Jason Gunthorpe [this message]
2021-03-19 12:56 ` [PATCH RFC 2/9] RDMA/rv: Add the internal header files kaike.wan
2021-03-19 16:02 ` Jason Gunthorpe
2021-03-19 12:56 ` [PATCH RFC 3/9] RDMA/rv: Add the rv module kaike.wan
2021-03-19 12:56 ` [PATCH RFC 4/9] RDMA/rv: Add functions for memory region cache kaike.wan
2021-03-19 12:56 ` [PATCH RFC 5/9] RDMA/rv: Add function to register/deregister memory region kaike.wan
2021-03-19 12:56 ` [PATCH RFC 6/9] RDMA/rv: Add connection management functions kaike.wan
2021-03-19 12:56 ` [PATCH RFC 7/9] RDMA/rv: Add functions for RDMA transactions kaike.wan
2021-03-19 12:56 ` [PATCH RFC 8/9] RDMA/rv: Add functions for file operations kaike.wan
2021-03-19 12:56 ` [PATCH RFC 9/9] RDMA/rv: Integrate the file operations into the rv module kaike.wan
2021-03-19 13:53 ` [PATCH RFC 0/9] A rendezvous module Jason Gunthorpe
2021-03-19 14:49 ` Wan, Kaike
2021-03-19 15:48 ` Jason Gunthorpe
2021-03-19 19:22 ` Dennis Dalessandro
2021-03-19 19:44 ` Jason Gunthorpe
2021-03-19 20:12 ` Rimmer, Todd
2021-03-19 20:26 ` Jason Gunthorpe
2021-03-19 20:46 ` Rimmer, Todd
2021-03-19 20:54 ` Jason Gunthorpe
2021-03-19 20:59 ` Wan, Kaike
2021-03-19 21:28 ` Dennis Dalessandro
2021-03-19 21:58 ` Wan, Kaike
2021-03-19 22:35 ` Jason Gunthorpe
2021-03-19 22:57 ` Rimmer, Todd
2021-03-19 23:06 ` Jason Gunthorpe
2021-03-20 16:39 ` Dennis Dalessandro
2021-03-21 8:56 ` Leon Romanovsky
2021-03-21 16:24 ` Dennis Dalessandro
2021-03-21 16:45 ` Jason Gunthorpe
2021-03-21 17:21 ` Dennis Dalessandro
2021-03-21 18:08 ` Jason Gunthorpe
2021-03-22 15:17 ` Rimmer, Todd
2021-03-22 16:47 ` Jason Gunthorpe
2021-03-22 17:31 ` Hefty, Sean
2021-03-23 22:56 ` Jason Gunthorpe
2021-03-23 23:29 ` Rimmer, Todd
2021-03-21 19:19 ` Wan, Kaike
2021-03-23 15:36 ` Christoph Hellwig
2021-03-23 15:35 ` Christoph Hellwig
2021-03-23 15:33 ` Christoph Hellwig
2021-03-23 15:30 ` Christoph Hellwig
2021-03-23 15:46 ` Jason Gunthorpe
2021-03-23 16:07 ` Christoph Hellwig
2021-03-23 17:25 ` Rimmer, Todd
2021-03-23 17:44 ` Jason Gunthorpe
2021-03-19 20:18 ` Dennis Dalessandro
2021-03-19 20:30 ` Jason Gunthorpe
2021-03-19 20:34 ` Hefty, Sean
2021-03-21 12:08 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210319160024.GW2356281@nvidia.com \
--to=jgg@nvidia.com \
--cc=dledford@redhat.com \
--cc=kaike.wan@intel.com \
--cc=linux-rdma@vger.kernel.org \
--cc=todd.rimmer@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox