Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* RE: Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-14 12:05 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161114120153.GC4240-2ukJVAZIZ/Y@public.gmane.org>

> There is addition of 1 in mlx4_ib_create_cq():
> 192         entries      = roundup_pow_of_two(entries + 1);
> 193         cq->ibcq.cqe = entries - 1;

I thought something else might hide there.
Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Configuration of cq->cqe is lower than entries by 1
From: Leon Romanovsky @ 2016-11-14 12:01 UTC (permalink / raw)
  To: Amrani, Ram; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR07MB22076C5DE03F1939603C554CF8BC0-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

On Mon, Nov 14, 2016 at 11:07:53AM +0000, Amrani, Ram wrote:
> Hi Leon, All,
> While inspecting MLX code as well as other vendors' I see that the actual number of cq->cqe is configured to be less by 1 than 'entries'. Why is that?

There is addition of 1 in mlx4_ib_create_cq():
192         entries      = roundup_pow_of_two(entries + 1);
193         cq->ibcq.cqe = entries - 1;

The same goes for mlx4_alloc_resize_buf, just earlier in the stack.

>
> e.g.
> struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
>                                 const struct ib_cq_init_attr *attr,
>                                 struct ib_ucontext *context,
>                                 struct ib_udata *udata)
> {
>         ...
>         cq->ibcq.cqe = entries - 1;
>         ...
> }
>
>
> static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
>                                   int entries)
> {
>         ...
>         cq->resize_buf->cqe = entries - 1;	// this is later copied to cq->ibcq.cqe
>         ...
> }
>
> Thanks,
> Ram
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v6 12/16] IB/pvrdma: Add Queue Pair support
From: Yuval Shaia @ 2016-11-14 11:34 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	pv-drivers-pghWNbHTmq7QT0dZR+AlfA, jhansen-pghWNbHTmq7QT0dZR+AlfA,
	asarwade-pghWNbHTmq7QT0dZR+AlfA,
	georgezhang-pghWNbHTmq7QT0dZR+AlfA,
	bryantan-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <6a643e92376856394d45638d80a90619d3abac37.1475458407.git.aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

On Sun, Oct 02, 2016 at 07:10:32PM -0700, Adit Ranadive wrote:
> This patch adds the ability to create, modify, query and destroy QPs. The
> PVRDMA device supports RC, UD and GSI QPs.
> 
> Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Jorgen Hansen <jhansen-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: George Zhang <georgezhang-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Aditya Sarwade <asarwade-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Bryan Tan <bryantan-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> ---
> Changes v5->v6:
>  - Removed a duplicate include of the ABI header.
> 
> Changes v4->v5:
>  - Updated include for headers in UAPI folder.
>  - Update to pvrdma_cmd_post for creating/destroying/querying/modifying QPs.
>  - Use the pvrdma_sge struct when posting WRs/allocating QP memory.
>  - Removed two set but unused variables.
> 
> Changes v3->v4:
>  - Removed an unnecessary switch case.
>  - Unified the returns in pvrdma_create_qp to use one exit point.
>  - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we need a lock to
>  be held when calling this.
>  - Updated to use wrapper for UAR write for QP.
>  - Updated conversion function to func_name(dst, src) format.
>  - Renamed max_gs to max_sg.
>  - Renamed cap variable to req_cap in pvrdma_set_sq/rq_size.
>  - Changed dev_warn to dev_warn_ratelimited in pvrdma_post_send/recv.
>  - Added nesting locking for flushing CQs when destroying/resetting a QP.
>  - Added missing ret value.
> 
> Changes v2->v3:
>  - Removed boolean in pvrdma_cmd_post.
> ---
>  drivers/infiniband/hw/pvrdma/pvrdma_qp.c | 972 +++++++++++++++++++++++++++++++
>  1 file changed, 972 insertions(+)
>  create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_qp.c
> 
> diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
> new file mode 100644
> index 0000000..c8c01e5
> --- /dev/null
> +++ b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
> @@ -0,0 +1,972 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <asm/page.h>
> +#include <linux/io.h>
> +#include <linux/wait.h>
> +#include <rdma/ib_addr.h>
> +#include <rdma/ib_smi.h>
> +#include <rdma/ib_user_verbs.h>
> +
> +#include "pvrdma.h"
> +
> +static inline void get_cqs(struct pvrdma_qp *qp, struct pvrdma_cq **send_cq,
> +			   struct pvrdma_cq **recv_cq)
> +{
> +	*send_cq = to_vcq(qp->ibqp.send_cq);
> +	*recv_cq = to_vcq(qp->ibqp.recv_cq);
> +}
> +
> +static void pvrdma_lock_cqs(struct pvrdma_cq *scq, struct pvrdma_cq *rcq,
> +			    unsigned long *scq_flags,
> +			    unsigned long *rcq_flags)
> +	__acquires(scq->cq_lock) __acquires(rcq->cq_lock)
> +{
> +	if (scq == rcq) {
> +		spin_lock_irqsave(&scq->cq_lock, *scq_flags);
> +		__acquire(rcq->cq_lock);
> +	} else if (scq->cq_handle < rcq->cq_handle) {
> +		spin_lock_irqsave(&scq->cq_lock, *scq_flags);
> +		spin_lock_irqsave_nested(&rcq->cq_lock, *rcq_flags,
> +					 SINGLE_DEPTH_NESTING);
> +	} else {
> +		spin_lock_irqsave(&rcq->cq_lock, *rcq_flags);
> +		spin_lock_irqsave_nested(&scq->cq_lock, *scq_flags,
> +					 SINGLE_DEPTH_NESTING);
> +	}
> +}
> +
> +static void pvrdma_unlock_cqs(struct pvrdma_cq *scq, struct pvrdma_cq *rcq,
> +			      unsigned long *scq_flags,
> +			      unsigned long *rcq_flags)
> +	__releases(scq->cq_lock) __releases(rcq->cq_lock)
> +{
> +	if (scq == rcq) {
> +		__release(rcq->cq_lock);
> +		spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> +	} else if (scq->cq_handle < rcq->cq_handle) {
> +		spin_unlock_irqrestore(&rcq->cq_lock, *rcq_flags);
> +		spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> +	} else {
> +		spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> +		spin_unlock_irqrestore(&rcq->cq_lock, *rcq_flags);
> +	}
> +}
> +
> +static void pvrdma_reset_qp(struct pvrdma_qp *qp)
> +{
> +	struct pvrdma_cq *scq, *rcq;
> +	unsigned long scq_flags, rcq_flags;
> +
> +	/* Clean up cqes */
> +	get_cqs(qp, &scq, &rcq);
> +	pvrdma_lock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> +	_pvrdma_flush_cqe(qp, scq);
> +	if (scq != rcq)
> +		_pvrdma_flush_cqe(qp, rcq);
> +
> +	pvrdma_unlock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> +	/*
> +	 * Reset queuepair. The checks are because usermode queuepairs won't
> +	 * have kernel ringstates.
> +	 */
> +	if (qp->rq.ring) {
> +		atomic_set(&qp->rq.ring->cons_head, 0);
> +		atomic_set(&qp->rq.ring->prod_tail, 0);
> +	}
> +	if (qp->sq.ring) {
> +		atomic_set(&qp->sq.ring->cons_head, 0);
> +		atomic_set(&qp->sq.ring->prod_tail, 0);
> +	}
> +}
> +
> +static int pvrdma_set_rq_size(struct pvrdma_dev *dev,
> +			      struct ib_qp_cap *req_cap,
> +			      struct pvrdma_qp *qp)
> +{
> +	if (req_cap->max_recv_wr > dev->dsr->caps.max_qp_wr ||
> +	    req_cap->max_recv_sge > dev->dsr->caps.max_sge) {
> +		dev_warn(&dev->pdev->dev, "recv queue size invalid\n");
> +		return -EINVAL;
> +	}
> +
> +	qp->rq.wqe_cnt = roundup_pow_of_two(max(1U, req_cap->max_recv_wr));
> +	qp->rq.max_sg = roundup_pow_of_two(max(1U, req_cap->max_recv_sge));
> +
> +	/* Write back */
> +	req_cap->max_recv_wr = qp->rq.wqe_cnt;
> +	req_cap->max_recv_sge = qp->rq.max_sg;
> +
> +	qp->rq.wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_rq_wqe_hdr) +
> +					     sizeof(struct pvrdma_sge) *
> +					     qp->rq.max_sg);
> +	qp->npages_recv = (qp->rq.wqe_cnt * qp->rq.wqe_size + PAGE_SIZE - 1) /
> +			  PAGE_SIZE;
> +
> +	return 0;
> +}
> +
> +static int pvrdma_set_sq_size(struct pvrdma_dev *dev, struct ib_qp_cap *req_cap,
> +			      enum ib_qp_type type, struct pvrdma_qp *qp)
> +{
> +	if (req_cap->max_send_wr > dev->dsr->caps.max_qp_wr ||
> +	    req_cap->max_send_sge > dev->dsr->caps.max_sge) {
> +		dev_warn(&dev->pdev->dev, "send queue size invalid\n");
> +		return -EINVAL;
> +	}
> +
> +	qp->sq.wqe_cnt = roundup_pow_of_two(max(1U, req_cap->max_send_wr));
> +	qp->sq.max_sg = roundup_pow_of_two(max(1U, req_cap->max_send_sge));
> +
> +	/* Write back */
> +	req_cap->max_send_wr = qp->sq.wqe_cnt;
> +	req_cap->max_send_sge = qp->sq.max_sg;
> +
> +	qp->sq.wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_sq_wqe_hdr) +
> +					     sizeof(struct pvrdma_sge) *
> +					     qp->sq.max_sg);
> +	/* Note: one extra page for the header. */
> +	qp->npages_send = 1 + (qp->sq.wqe_cnt * qp->sq.wqe_size +
> +			       PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +	return 0;
> +}
> +
> +/**
> + * pvrdma_create_qp - create queue pair
> + * @pd: protection domain
> + * @init_attr: queue pair attributes
> + * @udata: user data
> + *
> + * @return: the ib_qp pointer on success, otherwise returns an errno.
> + */
> +struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
> +			       struct ib_qp_init_attr *init_attr,
> +			       struct ib_udata *udata)
> +{
> +	struct pvrdma_qp *qp = NULL;
> +	struct pvrdma_dev *dev = to_vdev(pd->device);
> +	union pvrdma_cmd_req req;
> +	union pvrdma_cmd_resp rsp;
> +	struct pvrdma_cmd_create_qp *cmd = &req.create_qp;
> +	struct pvrdma_cmd_create_qp_resp *resp = &rsp.create_qp_resp;
> +	struct pvrdma_create_qp ucmd;
> +	unsigned long flags;
> +	int ret;
> +
> +	if (init_attr->create_flags) {
> +		dev_warn(&dev->pdev->dev,
> +			 "invalid create queuepair flags %#x\n",
> +			 init_attr->create_flags);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (init_attr->qp_type != IB_QPT_RC &&
> +	    init_attr->qp_type != IB_QPT_UD &&
> +	    init_attr->qp_type != IB_QPT_GSI) {
> +		dev_warn(&dev->pdev->dev, "queuepair type %d not supported\n",
> +			 init_attr->qp_type);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (!atomic_add_unless(&dev->num_qps, 1, dev->dsr->caps.max_qp))
> +		return ERR_PTR(-ENOMEM);
> +
> +	switch (init_attr->qp_type) {
> +	case IB_QPT_GSI:
> +		if (init_attr->port_num == 0 ||
> +		    init_attr->port_num > pd->device->phys_port_cnt ||
> +		    udata) {
> +			dev_warn(&dev->pdev->dev, "invalid queuepair attrs\n");
> +			ret = -EINVAL;
> +			goto err_qp;
> +		}
> +		/* fall through */
> +	case IB_QPT_RC:
> +	case IB_QPT_UD:
> +		qp = kzalloc(sizeof(*qp), GFP_KERNEL);
> +		if (!qp) {
> +			ret = -ENOMEM;
> +			goto err_qp;
> +		}
> +
> +		spin_lock_init(&qp->sq.lock);
> +		spin_lock_init(&qp->rq.lock);
> +		mutex_init(&qp->mutex);
> +		atomic_set(&qp->refcnt, 1);
> +		init_waitqueue_head(&qp->wait);
> +
> +		qp->state = IB_QPS_RESET;
> +
> +		if (pd->uobject && udata) {
> +			dev_dbg(&dev->pdev->dev,
> +				"create queuepair from user space\n");
> +
> +			if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
> +				ret = -EFAULT;
> +				goto err_qp;
> +			}
> +
> +			/* set qp->sq.wqe_cnt, shift, buf_size.. */
> +			qp->rumem = ib_umem_get(pd->uobject->context,
> +						ucmd.rbuf_addr,
> +						ucmd.rbuf_size, 0, 0);
> +			if (IS_ERR(qp->rumem)) {
> +				ret = PTR_ERR(qp->rumem);
> +				goto err_qp;
> +			}
> +
> +			qp->sumem = ib_umem_get(pd->uobject->context,
> +						ucmd.sbuf_addr,
> +						ucmd.sbuf_size, 0, 0);
> +			if (IS_ERR(qp->sumem)) {
> +				ib_umem_release(qp->rumem);
> +				ret = PTR_ERR(qp->sumem);
> +				goto err_qp;
> +			}
> +
> +			qp->npages_send = ib_umem_page_count(qp->sumem);
> +			qp->npages_recv = ib_umem_page_count(qp->rumem);
> +			qp->npages = qp->npages_send + qp->npages_recv;
> +		} else {
> +			qp->is_kernel = true;
> +
> +			ret = pvrdma_set_sq_size(to_vdev(pd->device),
> +						 &init_attr->cap,
> +						 init_attr->qp_type, qp);
> +			if (ret)
> +				goto err_qp;
> +
> +			ret = pvrdma_set_rq_size(to_vdev(pd->device),
> +						 &init_attr->cap, qp);
> +			if (ret)
> +				goto err_qp;
> +
> +			qp->npages = qp->npages_send + qp->npages_recv;
> +
> +			/* Skip header page. */
> +			qp->sq.offset = PAGE_SIZE;
> +
> +			/* Recv queue pages are after send pages. */
> +			qp->rq.offset = qp->npages_send * PAGE_SIZE;
> +		}
> +
> +		if (qp->npages < 0 || qp->npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
> +			dev_warn(&dev->pdev->dev,
> +				 "overflow pages in queuepair\n");
> +			ret = -EINVAL;
> +			goto err_umem;
> +		}
> +
> +		ret = pvrdma_page_dir_init(dev, &qp->pdir, qp->npages,
> +					   qp->is_kernel);
> +		if (ret) {
> +			dev_warn(&dev->pdev->dev,
> +				 "could not allocate page directory\n");
> +			goto err_umem;
> +		}
> +
> +		if (!qp->is_kernel) {
> +			pvrdma_page_dir_insert_umem(&qp->pdir, qp->sumem, 0);
> +			pvrdma_page_dir_insert_umem(&qp->pdir, qp->rumem,
> +						    qp->npages_send);
> +		} else {
> +			/* Ring state is always the first page. */
> +			qp->sq.ring = qp->pdir.pages[0];
> +			qp->rq.ring = &qp->sq.ring[1];
> +		}
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto err_qp;
> +	}
> +
> +	/* Not supported */
> +	init_attr->cap.max_inline_data = 0;
> +
> +	memset(cmd, 0, sizeof(*cmd));
> +	cmd->hdr.cmd = PVRDMA_CMD_CREATE_QP;
> +	cmd->pd_handle = to_vpd(pd)->pd_handle;
> +	cmd->send_cq_handle = to_vcq(init_attr->send_cq)->cq_handle;
> +	cmd->recv_cq_handle = to_vcq(init_attr->recv_cq)->cq_handle;
> +	cmd->max_send_wr = init_attr->cap.max_send_wr;
> +	cmd->max_recv_wr = init_attr->cap.max_recv_wr;
> +	cmd->max_send_sge = init_attr->cap.max_send_sge;
> +	cmd->max_recv_sge = init_attr->cap.max_recv_sge;
> +	cmd->max_inline_data = init_attr->cap.max_inline_data;
> +	cmd->sq_sig_all = (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR) ? 1 : 0;
> +	cmd->qp_type = ib_qp_type_to_pvrdma(init_attr->qp_type);
> +	cmd->access_flags = IB_ACCESS_LOCAL_WRITE;
> +	cmd->total_chunks = qp->npages;
> +	cmd->send_chunks = qp->npages_send - 1;
> +	cmd->pdir_dma = qp->pdir.dir_dma;
> +
> +	dev_dbg(&dev->pdev->dev, "create queuepair with %d, %d, %d, %d\n",
> +		cmd->max_send_wr, cmd->max_recv_wr, cmd->max_send_sge,
> +		cmd->max_recv_sge);
> +
> +	ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_QP_RESP);
> +	if (ret < 0) {
> +		dev_warn(&dev->pdev->dev,
> +			 "could not create queuepair, error: %d\n", ret);
> +		goto err_pdir;
> +	}
> +
> +	/* max_send_wr/_recv_wr/_send_sge/_recv_sge/_inline_data */
> +	qp->qp_handle = resp->qpn;
> +	qp->port = init_attr->port_num;
> +	qp->ibqp.qp_num = resp->qpn;
> +	spin_lock_irqsave(&dev->qp_tbl_lock, flags);
> +	dev->qp_tbl[qp->qp_handle % dev->dsr->caps.max_qp] = qp;
> +	spin_unlock_irqrestore(&dev->qp_tbl_lock, flags);
> +
> +	return &qp->ibqp;
> +
> +err_pdir:
> +	pvrdma_page_dir_cleanup(dev, &qp->pdir);
> +err_umem:
> +	if (pd->uobject && udata) {
> +		if (qp->rumem)
> +			ib_umem_release(qp->rumem);
> +		if (qp->sumem)
> +			ib_umem_release(qp->sumem);
> +	}
> +err_qp:
> +	kfree(qp);
> +	atomic_dec(&dev->num_qps);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void pvrdma_free_qp(struct pvrdma_qp *qp)
> +{
> +	struct pvrdma_dev *dev = to_vdev(qp->ibqp.device);
> +	struct pvrdma_cq *scq;
> +	struct pvrdma_cq *rcq;
> +	unsigned long flags, scq_flags, rcq_flags;
> +
> +	/* In case cq is polling */
> +	get_cqs(qp, &scq, &rcq);
> +	pvrdma_lock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> +	_pvrdma_flush_cqe(qp, scq);
> +	if (scq != rcq)
> +		_pvrdma_flush_cqe(qp, rcq);
> +
> +	spin_lock_irqsave(&dev->qp_tbl_lock, flags);
> +	dev->qp_tbl[qp->qp_handle] = NULL;
> +	spin_unlock_irqrestore(&dev->qp_tbl_lock, flags);
> +
> +	pvrdma_unlock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> +	atomic_dec(&qp->refcnt);
> +	wait_event(qp->wait, !atomic_read(&qp->refcnt));
> +
> +	pvrdma_page_dir_cleanup(dev, &qp->pdir);
> +
> +	kfree(qp);
> +
> +	atomic_dec(&dev->num_qps);
> +}
> +
> +/**
> + * pvrdma_destroy_qp - destroy a queue pair
> + * @qp: the queue pair to destroy
> + *
> + * @return: 0 on success.
> + */
> +int pvrdma_destroy_qp(struct ib_qp *qp)
> +{
> +	struct pvrdma_qp *vqp = to_vqp(qp);
> +	union pvrdma_cmd_req req;
> +	struct pvrdma_cmd_destroy_qp *cmd = &req.destroy_qp;
> +	int ret;
> +
> +	memset(cmd, 0, sizeof(*cmd));
> +	cmd->hdr.cmd = PVRDMA_CMD_DESTROY_QP;
> +	cmd->qp_handle = vqp->qp_handle;
> +
> +	ret = pvrdma_cmd_post(to_vdev(qp->device), &req, NULL, 0);
> +	if (ret < 0)
> +		dev_warn(&to_vdev(qp->device)->pdev->dev,
> +			 "destroy queuepair failed, error: %d\n", ret);
> +
> +	pvrdma_free_qp(vqp);
> +
> +	return 0;
> +}
> +
> +/**
> + * pvrdma_modify_qp - modify queue pair attributes
> + * @ibqp: the queue pair
> + * @attr: the new queue pair's attributes
> + * @attr_mask: attributes mask
> + * @udata: user data
> + *
> + * @returns 0 on success, otherwise returns an errno.
> + */
> +int pvrdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
> +		     int attr_mask, struct ib_udata *udata)
> +{
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	union pvrdma_cmd_req req;
> +	union pvrdma_cmd_resp rsp;
> +	struct pvrdma_cmd_modify_qp *cmd = &req.modify_qp;
> +	int cur_state, next_state;
> +	int ret;
> +
> +	/* Sanity checking. Should need lock here */
> +	mutex_lock(&qp->mutex);
> +	cur_state = (attr_mask & IB_QP_CUR_STATE) ? attr->cur_qp_state :
> +		qp->state;
> +	next_state = (attr_mask & IB_QP_STATE) ? attr->qp_state : cur_state;
> +
> +	if (!ib_modify_qp_is_ok(cur_state, next_state, ibqp->qp_type,
> +				attr_mask, IB_LINK_LAYER_ETHERNET)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (attr_mask & IB_QP_PORT) {
> +		if (attr->port_num == 0 ||
> +		    attr->port_num > ibqp->device->phys_port_cnt) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +	}
> +
> +	if (attr_mask & IB_QP_MIN_RNR_TIMER) {
> +		if (attr->min_rnr_timer > 31) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +	}
> +
> +	if (attr_mask & IB_QP_PKEY_INDEX) {
> +		if (attr->pkey_index >= dev->dsr->caps.max_pkeys) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +	}
> +
> +	if (attr_mask & IB_QP_QKEY)
> +		qp->qkey = attr->qkey;
> +
> +	if (cur_state == next_state && cur_state == IB_QPS_RESET) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	qp->state = next_state;
> +	memset(cmd, 0, sizeof(*cmd));
> +	cmd->hdr.cmd = PVRDMA_CMD_MODIFY_QP;
> +	cmd->qp_handle = qp->qp_handle;
> +	cmd->attr_mask = ib_qp_attr_mask_to_pvrdma(attr_mask);
> +	cmd->attrs.qp_state = ib_qp_state_to_pvrdma(attr->qp_state);
> +	cmd->attrs.cur_qp_state =
> +		ib_qp_state_to_pvrdma(attr->cur_qp_state);
> +	cmd->attrs.path_mtu = ib_mtu_to_pvrdma(attr->path_mtu);
> +	cmd->attrs.path_mig_state =
> +		ib_mig_state_to_pvrdma(attr->path_mig_state);
> +	cmd->attrs.qkey = attr->qkey;
> +	cmd->attrs.rq_psn = attr->rq_psn;
> +	cmd->attrs.sq_psn = attr->sq_psn;
> +	cmd->attrs.dest_qp_num = attr->dest_qp_num;
> +	cmd->attrs.qp_access_flags =
> +		ib_access_flags_to_pvrdma(attr->qp_access_flags);
> +	cmd->attrs.pkey_index = attr->pkey_index;
> +	cmd->attrs.alt_pkey_index = attr->alt_pkey_index;
> +	cmd->attrs.en_sqd_async_notify = attr->en_sqd_async_notify;
> +	cmd->attrs.sq_draining = attr->sq_draining;
> +	cmd->attrs.max_rd_atomic = attr->max_rd_atomic;
> +	cmd->attrs.max_dest_rd_atomic = attr->max_dest_rd_atomic;
> +	cmd->attrs.min_rnr_timer = attr->min_rnr_timer;
> +	cmd->attrs.port_num = attr->port_num;
> +	cmd->attrs.timeout = attr->timeout;
> +	cmd->attrs.retry_cnt = attr->retry_cnt;
> +	cmd->attrs.rnr_retry = attr->rnr_retry;
> +	cmd->attrs.alt_port_num = attr->alt_port_num;
> +	cmd->attrs.alt_timeout = attr->alt_timeout;
> +	ib_qp_cap_to_pvrdma(&cmd->attrs.cap, &attr->cap);
> +	ib_ah_attr_to_pvrdma(&cmd->attrs.ah_attr, &attr->ah_attr);
> +	ib_ah_attr_to_pvrdma(&cmd->attrs.alt_ah_attr, &attr->alt_ah_attr);
> +
> +	ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_MODIFY_QP_RESP);
> +	if (ret < 0) {
> +		dev_warn(&dev->pdev->dev,
> +			 "could not modify queuepair, error: %d\n", ret);
> +	} else if (rsp.hdr.err > 0) {
> +		dev_warn(&dev->pdev->dev,
> +			 "cannot modify queuepair, error: %d\n", rsp.hdr.err);
> +		ret = -EINVAL;
> +	}
> +
> +	if (ret == 0 && next_state == IB_QPS_RESET)
> +		pvrdma_reset_qp(qp);
> +
> +out:
> +	mutex_unlock(&qp->mutex);
> +
> +	return ret;
> +}
> +
> +static inline void *get_sq_wqe(struct pvrdma_qp *qp, int n)
> +{
> +	return pvrdma_page_dir_get_ptr(&qp->pdir,
> +				       qp->sq.offset + n * qp->sq.wqe_size);
> +}
> +
> +static inline void *get_rq_wqe(struct pvrdma_qp *qp, int n)
> +{
> +	return pvrdma_page_dir_get_ptr(&qp->pdir,
> +				       qp->rq.offset + n * qp->rq.wqe_size);
> +}
> +
> +static int set_reg_seg(struct pvrdma_sq_wqe_hdr *wqe_hdr, struct ib_reg_wr *wr)
> +{
> +	struct pvrdma_user_mr *mr = to_vmr(wr->mr);
> +
> +	wqe_hdr->wr.fast_reg.iova_start = mr->ibmr.iova;
> +	wqe_hdr->wr.fast_reg.pl_pdir_dma = mr->pdir.dir_dma;
> +	wqe_hdr->wr.fast_reg.page_shift = mr->page_shift;
> +	wqe_hdr->wr.fast_reg.page_list_len = mr->npages;
> +	wqe_hdr->wr.fast_reg.length = mr->ibmr.length;
> +	wqe_hdr->wr.fast_reg.access_flags = wr->access;
> +	wqe_hdr->wr.fast_reg.rkey = wr->key;
> +
> +	return pvrdma_page_dir_insert_page_list(&mr->pdir, mr->pages,
> +						mr->npages);
> +}
> +
> +/**
> + * pvrdma_post_send - post send work request entries on a QP
> + * @ibqp: the QP
> + * @wr: work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> +		     struct ib_send_wr **bad_wr)
> +{
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	unsigned long flags;
> +	struct pvrdma_sq_wqe_hdr *wqe_hdr;
> +	struct pvrdma_sge *sge;
> +	int i, index;
> +	int nreq;
> +	int ret;
> +
> +	/*
> +	 * In states lower than RTS, we can fail immediately. In other states,
> +	 * just post and let the device figure it out.
> +	 */
> +	if (qp->state < IB_QPS_RTS) {
> +		*bad_wr = wr;
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_irqsave(&qp->sq.lock, flags);
> +
> +	index = pvrdma_idx(&qp->sq.ring->prod_tail, qp->sq.wqe_cnt);

Not sure if it was discussed so posting "just in case".
I believe it is unlikely that index will go out of range but since
pvrdma_idx might return PVRDMA_INVALID_IDX i suggest to add a check here.
Something like:
	if (unlikely(index == PVRDMA_INVALID_IDX)) {
		*bad_wr = wr;
		return -EFAULT;
	}

Same goes with pvrdma_post_recv

> +	for (nreq = 0; wr; nreq++, wr = wr->next) {
> +		unsigned int tail;
> +
> +		if (unlikely(!pvrdma_idx_ring_has_space(
> +				qp->sq.ring, qp->sq.wqe_cnt, &tail))) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "send queue is full\n");
> +			*bad_wr = wr;
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		if (unlikely(wr->num_sge > qp->sq.max_sg || wr->num_sge < 0)) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "send SGE overflow\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (unlikely(wr->opcode < 0)) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "invalid send opcode\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		/*
> +		 * Only support UD, RC.
> +		 * Need to check opcode table for thorough checking.
> +		 * opcode		_UD	_UC	_RC
> +		 * _SEND		x	x	x
> +		 * _SEND_WITH_IMM	x	x	x
> +		 * _RDMA_WRITE			x	x
> +		 * _RDMA_WRITE_WITH_IMM		x	x
> +		 * _LOCAL_INV			x	x
> +		 * _SEND_WITH_INV		x	x
> +		 * _RDMA_READ				x
> +		 * _ATOMIC_CMP_AND_SWP			x
> +		 * _ATOMIC_FETCH_AND_ADD		x
> +		 * _MASK_ATOMIC_CMP_AND_SWP		x
> +		 * _MASK_ATOMIC_FETCH_AND_ADD		x
> +		 * _REG_MR				x
> +		 *
> +		 */
> +		if (qp->ibqp.qp_type != IB_QPT_UD &&
> +		    qp->ibqp.qp_type != IB_QPT_RC &&
> +			wr->opcode != IB_WR_SEND) {
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "unsupported queuepair type\n");
> +			*bad_wr = wr;
> +			ret = -EINVAL;
> +			goto out;
> +		} else if (qp->ibqp.qp_type == IB_QPT_UD ||
> +			   qp->ibqp.qp_type == IB_QPT_GSI) {
> +			if (wr->opcode != IB_WR_SEND &&
> +			    wr->opcode != IB_WR_SEND_WITH_IMM) {
> +				dev_warn_ratelimited(&dev->pdev->dev,
> +						     "invalid send opcode\n");
> +				*bad_wr = wr;
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +		}
> +
> +		wqe_hdr = (struct pvrdma_sq_wqe_hdr *)get_sq_wqe(qp, index);
> +		memset(wqe_hdr, 0, sizeof(*wqe_hdr));
> +		wqe_hdr->wr_id = wr->wr_id;
> +		wqe_hdr->num_sge = wr->num_sge;
> +		wqe_hdr->opcode = ib_wr_opcode_to_pvrdma(wr->opcode);
> +		wqe_hdr->send_flags = ib_send_flags_to_pvrdma(wr->send_flags);
> +		if (wr->opcode == IB_WR_SEND_WITH_IMM ||
> +		    wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
> +			wqe_hdr->ex.imm_data = wr->ex.imm_data;
> +
> +		switch (qp->ibqp.qp_type) {
> +		case IB_QPT_GSI:
> +		case IB_QPT_UD:
> +			if (unlikely(!ud_wr(wr)->ah)) {
> +				dev_warn_ratelimited(&dev->pdev->dev,
> +						     "invalid address handle\n");
> +				*bad_wr = wr;
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +
> +			/*
> +			 * Use qkey from qp context if high order bit set,
> +			 * otherwise from work request.
> +			 */
> +			wqe_hdr->wr.ud.remote_qpn = ud_wr(wr)->remote_qpn;
> +			wqe_hdr->wr.ud.remote_qkey =
> +				ud_wr(wr)->remote_qkey & 0x80000000 ?
> +				qp->qkey : ud_wr(wr)->remote_qkey;
> +			wqe_hdr->wr.ud.av = to_vah(ud_wr(wr)->ah)->av;
> +
> +			break;
> +		case IB_QPT_RC:
> +			switch (wr->opcode) {
> +			case IB_WR_RDMA_READ:
> +			case IB_WR_RDMA_WRITE:
> +			case IB_WR_RDMA_WRITE_WITH_IMM:
> +				wqe_hdr->wr.rdma.remote_addr =
> +					rdma_wr(wr)->remote_addr;
> +				wqe_hdr->wr.rdma.rkey = rdma_wr(wr)->rkey;
> +				break;
> +			case IB_WR_LOCAL_INV:
> +			case IB_WR_SEND_WITH_INV:
> +				wqe_hdr->ex.invalidate_rkey =
> +					wr->ex.invalidate_rkey;
> +				break;
> +			case IB_WR_ATOMIC_CMP_AND_SWP:
> +			case IB_WR_ATOMIC_FETCH_AND_ADD:
> +				wqe_hdr->wr.atomic.remote_addr =
> +					atomic_wr(wr)->remote_addr;
> +				wqe_hdr->wr.atomic.rkey = atomic_wr(wr)->rkey;
> +				wqe_hdr->wr.atomic.compare_add =
> +					atomic_wr(wr)->compare_add;
> +				if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP)
> +					wqe_hdr->wr.atomic.swap =
> +						atomic_wr(wr)->swap;
> +				break;
> +			case IB_WR_REG_MR:
> +				ret = set_reg_seg(wqe_hdr, reg_wr(wr));
> +				if (ret < 0) {
> +					dev_warn_ratelimited(&dev->pdev->dev,
> +							     "Failed to set fast register work request\n");
> +					*bad_wr = wr;
> +					goto out;
> +				}
> +				break;
> +			default:
> +				break;
> +			}
> +
> +			break;
> +		default:
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "invalid queuepair type\n");
> +			ret = -EINVAL;
> +			*bad_wr = wr;
> +			goto out;
> +		}
> +
> +		sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> +		for (i = 0; i < wr->num_sge; i++) {
> +			/* Need to check wqe_size 0 or max size */
> +			sge->addr = wr->sg_list[i].addr;
> +			sge->length = wr->sg_list[i].length;
> +			sge->lkey = wr->sg_list[i].lkey;
> +			sge++;
> +		}
> +
> +		/* Make sure wqe is written before index update */
> +		smp_wmb();
> +
> +		index++;
> +		if (unlikely(index >= qp->sq.wqe_cnt))
> +			index = 0;
> +		/* Update shared sq ring */
> +		pvrdma_idx_ring_inc(&qp->sq.ring->prod_tail,
> +				    qp->sq.wqe_cnt);
> +	}
> +
> +	ret = 0;
> +
> +out:
> +	spin_unlock_irqrestore(&qp->sq.lock, flags);
> +
> +	if (!ret)
> +		pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_SEND | qp->qp_handle);
> +
> +	return ret;
> +}
> +
> +/**
> + * pvrdma_post_receive - post receive work request entries on a QP
> + * @ibqp: the QP
> + * @wr: the work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
> +		     struct ib_recv_wr **bad_wr)
> +{
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	unsigned long flags;
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	struct pvrdma_rq_wqe_hdr *wqe_hdr;
> +	struct pvrdma_sge *sge;
> +	int index, nreq;
> +	int ret = 0;
> +	int i;
> +
> +	/*
> +	 * In the RESET state, we can fail immediately. For other states,
> +	 * just post and let the device figure it out.
> +	 */
> +	if (qp->state == IB_QPS_RESET) {
> +		*bad_wr = wr;
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_irqsave(&qp->rq.lock, flags);
> +
> +	index = pvrdma_idx(&qp->rq.ring->prod_tail, qp->rq.wqe_cnt);
> +	for (nreq = 0; wr; nreq++, wr = wr->next) {
> +		unsigned int tail;
> +
> +		if (unlikely(wr->num_sge > qp->rq.max_sg ||
> +			     wr->num_sge < 0)) {
> +			ret = -EINVAL;
> +			*bad_wr = wr;
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "recv SGE overflow\n");
> +			goto out;
> +		}
> +
> +		if (unlikely(!pvrdma_idx_ring_has_space(
> +				qp->rq.ring, qp->rq.wqe_cnt, &tail))) {
> +			ret = -ENOMEM;
> +			*bad_wr = wr;
> +			dev_warn_ratelimited(&dev->pdev->dev,
> +					     "recv queue full\n");
> +			goto out;
> +		}
> +
> +		wqe_hdr = (struct pvrdma_rq_wqe_hdr *)get_rq_wqe(qp, index);
> +		wqe_hdr->wr_id = wr->wr_id;
> +		wqe_hdr->num_sge = wr->num_sge;
> +		wqe_hdr->total_len = 0;
> +
> +		sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> +		for (i = 0; i < wr->num_sge; i++) {
> +			sge->addr = wr->sg_list[i].addr;
> +			sge->length = wr->sg_list[i].length;
> +			sge->lkey = wr->sg_list[i].lkey;
> +			sge++;
> +		}
> +
> +		/* Make sure wqe is written before index update */
> +		smp_wmb();
> +
> +		index++;
> +		if (unlikely(index >= qp->rq.wqe_cnt))
> +			index = 0;
> +		/* Update shared rq ring */
> +		pvrdma_idx_ring_inc(&qp->rq.ring->prod_tail,
> +				    qp->rq.wqe_cnt);
> +	}
> +
> +	spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> +	pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_RECV | qp->qp_handle);
> +
> +	return ret;
> +
> +out:
> +	spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> +	return ret;
> +}
> +
> +/**
> + * pvrdma_query_qp - query a queue pair's attributes
> + * @ibqp: the queue pair to query
> + * @attr: the queue pair's attributes
> + * @attr_mask: attributes mask
> + * @init_attr: initial queue pair attributes
> + *
> + * @returns 0 on success, otherwise returns an errno.
> + */
> +int pvrdma_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
> +		    int attr_mask, struct ib_qp_init_attr *init_attr)
> +{
> +	struct pvrdma_dev *dev = to_vdev(ibqp->device);
> +	struct pvrdma_qp *qp = to_vqp(ibqp);
> +	union pvrdma_cmd_req req;
> +	union pvrdma_cmd_resp rsp;
> +	struct pvrdma_cmd_query_qp *cmd = &req.query_qp;
> +	struct pvrdma_cmd_query_qp_resp *resp = &rsp.query_qp_resp;
> +	int ret = 0;
> +
> +	mutex_lock(&qp->mutex);
> +
> +	if (qp->state == IB_QPS_RESET) {
> +		attr->qp_state = IB_QPS_RESET;
> +		goto out;
> +	}
> +
> +	memset(cmd, 0, sizeof(*cmd));
> +	cmd->hdr.cmd = PVRDMA_CMD_QUERY_QP;
> +	cmd->qp_handle = qp->qp_handle;
> +	cmd->attr_mask = ib_qp_attr_mask_to_pvrdma(attr_mask);
> +
> +	ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_QUERY_QP_RESP);
> +	if (ret < 0) {
> +		dev_warn(&dev->pdev->dev,
> +			 "could not query queuepair, error: %d\n", ret);
> +		goto out;
> +	}
> +
> +	attr->qp_state = pvrdma_qp_state_to_ib(resp->attrs.qp_state);
> +	attr->cur_qp_state =
> +		pvrdma_qp_state_to_ib(resp->attrs.cur_qp_state);
> +	attr->path_mtu = pvrdma_mtu_to_ib(resp->attrs.path_mtu);
> +	attr->path_mig_state =
> +		pvrdma_mig_state_to_ib(resp->attrs.path_mig_state);
> +	attr->qkey = resp->attrs.qkey;
> +	attr->rq_psn = resp->attrs.rq_psn;
> +	attr->sq_psn = resp->attrs.sq_psn;
> +	attr->dest_qp_num = resp->attrs.dest_qp_num;
> +	attr->qp_access_flags =
> +		pvrdma_access_flags_to_ib(resp->attrs.qp_access_flags);
> +	attr->pkey_index = resp->attrs.pkey_index;
> +	attr->alt_pkey_index = resp->attrs.alt_pkey_index;
> +	attr->en_sqd_async_notify = resp->attrs.en_sqd_async_notify;
> +	attr->sq_draining = resp->attrs.sq_draining;
> +	attr->max_rd_atomic = resp->attrs.max_rd_atomic;
> +	attr->max_dest_rd_atomic = resp->attrs.max_dest_rd_atomic;
> +	attr->min_rnr_timer = resp->attrs.min_rnr_timer;
> +	attr->port_num = resp->attrs.port_num;
> +	attr->timeout = resp->attrs.timeout;
> +	attr->retry_cnt = resp->attrs.retry_cnt;
> +	attr->rnr_retry = resp->attrs.rnr_retry;
> +	attr->alt_port_num = resp->attrs.alt_port_num;
> +	attr->alt_timeout = resp->attrs.alt_timeout;
> +	pvrdma_qp_cap_to_ib(&attr->cap, &resp->attrs.cap);
> +	pvrdma_ah_attr_to_ib(&attr->ah_attr, &resp->attrs.ah_attr);
> +	pvrdma_ah_attr_to_ib(&attr->alt_ah_attr, &resp->attrs.alt_ah_attr);
> +
> +	qp->state = attr->qp_state;
> +
> +	ret = 0;
> +
> +out:
> +	attr->cur_qp_state = attr->qp_state;
> +
> +	init_attr->event_handler = qp->ibqp.event_handler;
> +	init_attr->qp_context = qp->ibqp.qp_context;
> +	init_attr->send_cq = qp->ibqp.send_cq;
> +	init_attr->recv_cq = qp->ibqp.recv_cq;
> +	init_attr->srq = qp->ibqp.srq;
> +	init_attr->xrcd = NULL;
> +	init_attr->cap = attr->cap;
> +	init_attr->sq_sig_type = 0;
> +	init_attr->qp_type = qp->ibqp.qp_type;
> +	init_attr->create_flags = 0;
> +	init_attr->port_num = qp->port;
> +
> +	mutex_unlock(&qp->mutex);
> +	return ret;
> +}
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-14 11:07 UTC (permalink / raw)
  To: Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Leon, All,
While inspecting MLX code as well as other vendors' I see that the actual number of cq->cqe is configured to be less by 1 than 'entries'. Why is that?

e.g.
struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
                                const struct ib_cq_init_attr *attr,
                                struct ib_ucontext *context,
                                struct ib_udata *udata)
{
        ...
        cq->ibcq.cqe = entries - 1;
        ...
}


static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
                                  int entries)
{
        ...
        cq->resize_buf->cqe = entries - 1;	// this is later copied to cq->ibcq.cqe
        ...
}

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR
From: Leon Romanovsky @ 2016-11-14  6:00 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Christian Benvenuti, kernel-janitors, Dave Goodell, Doug Ledford,
	Sean Hefty, Hal Rosenstock, linux-rdma, linux-kernel,
	Christophe JAILLET
In-Reply-To: <1478891066-16093-1-git-send-email-Julia.Lawall@lip6.fr>

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Fri, Nov 11, 2016 at 08:04:26PM +0100, Julia Lawall wrote:
> The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a
> valid pointer, never NULL.  The same is true of get_qp_res_chunk, which
> just returns the result of calling usnic_ib_qp_grp_get_chunk.  Simplify
> IS_ERR_OR_NULL to IS_ERR in both cases.
>
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
>
> // <smpl>
> @@
> expression t,e;
> @@
>
> t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
> ... when != t=e
> - IS_ERR_OR_NULL(t)
> + IS_ERR(t)
>
> @@
> expression t,e,e1;
> @@
>
> t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
> ... when != t=e
> ?- t ? PTR_ERR(t) : e1
> + PTR_ERR(t)
> ... when any
> // </smpl>
>
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

Thanks, Julia.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v5 rdma-core 0/7] libhns: userspace library for hns
From: Leon Romanovsky @ 2016-11-14  5:53 UTC (permalink / raw)
  To: Lijun Ou
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Sun, Nov 13, 2016 at 06:35:53PM +0800, Lijun Ou wrote:
> This patch series introduces userspace library for hns RoCE driver.
>
> changes v4 -> v5:
> 1. elminate the warning when CFLAGS equal to -m32
>
> changes v3 -> v4:
> 1. eliminate the warning by Travis CI testing
>
> changes v2 -> v3:
> 1. Fix the code style, for example, if (addr == NULL)
> 2. Fix the bug for hns_roce_u_reg_mr
>
> changes v1 -> v2:
> 1. Delete the min() definition and instead of ccan header
> 2. Delete the CHECK_C_SOURCE_COMPILES
> 3. sort the c file in rdma_provider()
> 4. Delete the unused code in hns_roce_u_db.h
>
> Lijun Ou (7):
>   libhns: Add initial main frame
>   libhns: Add verbs of querying device and querying port
>   libhns: Add verbs of pd and mr support
>   libhns: Add verbs of cq support
>   libhns: Add verbs of qp support
>   libhns: Add verbs of post_send and post_recv support
>   libhns: Add consolidated repo for userspace library of hns


Thanks, applied.
https://github.com/linux-rdma/rdma-core/pull/38

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH, RESEND] IB/srpt: Report login failures only once
From: Max Gurtovoy @ 2016-11-13 17:29 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: Nicholas A. Bellinger, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <7737d8fc-d41d-0755-d7d2-a3a2b9b6a76e-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>



On 11/12/2016 2:36 AM, Bart Van Assche wrote:
> Report the following message only once if no ACL has been configured
> yet for an initiator port:
>
> "Rejected login because no ACL has been configured yet for initiator %s.\n"
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>

Looks good.

Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-13 16:42 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
  Cc: Elior, Ariel, Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477400039-16925-1-git-send-email-Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

> The recent introduction of qedr driver support in qede causes a GPF when
> probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids
> using an uninitialized pointer in such a case. Caught by the kernel test robot.
> 
> Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/net/ethernet/qlogic/qede/qede_roce.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_roce.c
> b/drivers/net/ethernet/qlogic/qede/qede_roce.c
> index 9867f96..4927271 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_roce.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_roce.c
> @@ -191,8 +191,8 @@ int qede_roce_register_driver(struct qedr_driver *drv)
>  	}
>  	mutex_unlock(&qedr_dev_list_lock);
> 
> -	DP_INFO(edev, "qedr: discovered and registered %d RoCE funcs\n",
> -		qedr_counter);
> +	pr_notice("qedr: discovered and registered %d RoCE funcs\n",
> +		  qedr_counter);
> 
>  	return 0;
>  }
> --
> 1.8.3.1


Hi Doug,
Can you update if this patch taken and if not, when do you expect it will be?

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5 rdma-core 7/7] libhns: Add consolidated repo for userspace library of hns
From: Lijun Ou @ 2016-11-13 10:36 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch configures the consolidated repo to build userspace
library of hns(libhns).

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3:
- No change over v2

v2:
- Delete the CHECK_C_SOURCE_COMPILES and sort the .c file

v1:
- The initial submit
---
 CMakeLists.txt               | 1 +
 MAINTAINERS                  | 6 ++++++
 README.md                    | 1 +
 providers/hns/CMakeLists.txt | 6 ++++++
 4 files changed, 14 insertions(+)
 create mode 100644 providers/hns/CMakeLists.txt

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0ac7477..3c2aa79 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -328,6 +328,7 @@ add_subdirectory(libibcm)
 add_subdirectory(providers/cxgb3)
 add_subdirectory(providers/cxgb4)
 add_subdirectory(providers/hfi1verbs)
+add_subdirectory(providers/hns)
 add_subdirectory(providers/i40iw)
 add_subdirectory(providers/ipathverbs)
 add_subdirectory(providers/mlx4)
diff --git a/MAINTAINERS b/MAINTAINERS
index d83de10..bc6eb50 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,6 +57,12 @@ S:	Supported
 L:	intel-opa-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org (moderated for non-subscribers)
 F:	providers/hfi1verbs/
 
+HNS USERSPACE PROVIDER (for hns-roce.ko)
+M:	Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+M:	Wei Hu(Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+S:	Supported
+F:	providers/hns/
+
 I40IW USERSPACE PROVIDER (for i40iw.ko)
 M:	Tatyana Nikolova <Tatyana.E.Nikolova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
 S:	Supported
diff --git a/README.md b/README.md
index 3a13042..e3bc33f 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@ is included:
  - iw_cxgb3.ko
  - iw_cxgb4.ko
  - hfi1.ko
+ - hns-roce.ko
  - i40iw.ko
  - ib_qib.ko
  - mlx4_ib.ko
diff --git a/providers/hns/CMakeLists.txt b/providers/hns/CMakeLists.txt
new file mode 100644
index 0000000..19a793e
--- /dev/null
+++ b/providers/hns/CMakeLists.txt
@@ -0,0 +1,6 @@
+rdma_provider(hns
+  hns_roce_u.c
+  hns_roce_u_buf.c
+  hns_roce_u_hw_v1.c
+  hns_roce_u_verbs.c
+)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 6/7] libhns: Add verbs of post_send and post_recv support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the verbs of posting send
and psoting recv.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3/v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |   2 +
 providers/hns/hns_roce_u.h       |   8 +
 providers/hns/hns_roce_u_hw_v1.c | 314 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h |  79 ++++++++++
 4 files changed, 403 insertions(+)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index de2fd57..281f9f4 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -131,6 +131,8 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.query_qp	   = hns_roce_u_query_qp;
 	context->ibv_ctx.ops.modify_qp     = hr_dev->u_hw->modify_qp;
 	context->ibv_ctx.ops.destroy_qp    = hr_dev->u_hw->destroy_qp;
+	context->ibv_ctx.ops.post_send     = hr_dev->u_hw->post_send;
+	context->ibv_ctx.ops.post_recv     = hr_dev->u_hw->post_recv;
 
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 02b9251..4a6ed8e 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -51,6 +51,10 @@
 
 #define PFX				"hns: "
 
+#ifndef likely
+#define likely(x)     __builtin_expect(!!(x), 1)
+#endif
+
 #define roce_get_field(origin, mask, shift) \
 	(((origin) & (mask)) >> (shift))
 
@@ -171,6 +175,10 @@ struct hns_roce_qp {
 struct hns_roce_u_hw {
 	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
 	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+	int (*post_send)(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+			 struct ibv_send_wr **bad_wr);
+	int (*post_recv)(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+			 struct ibv_recv_wr **bad_wr);
 	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
 			 int attr_mask);
 	int (*destroy_qp)(struct ibv_qp *ibqp);
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index e5c7f6a..e5cfe48 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -37,6 +37,59 @@
 #include "hns_roce_u_hw_v1.h"
 #include "hns_roce_u.h"
 
+static inline void set_raddr_seg(struct hns_roce_wqe_raddr_seg *rseg,
+				 uint64_t remote_addr, uint32_t rkey)
+{
+	rseg->raddr    = remote_addr;
+	rseg->rkey     = rkey;
+	rseg->len      = 0;
+}
+
+static void set_data_seg(struct hns_roce_wqe_data_seg *dseg, struct ibv_sge *sg)
+{
+
+	dseg->lkey = sg->lkey;
+	dseg->addr = sg->addr;
+	dseg->len = sg->length;
+}
+
+static void hns_roce_update_rq_head(struct hns_roce_context *ctx,
+				    unsigned int qpn, unsigned int rq_head)
+{
+	struct hns_roce_rq_db rq_db;
+
+	rq_db.u32_4 = 0;
+	rq_db.u32_8 = 0;
+
+	roce_set_field(rq_db.u32_4, RQ_DB_U32_4_RQ_HEAD_M,
+		       RQ_DB_U32_4_RQ_HEAD_S, rq_head);
+	roce_set_field(rq_db.u32_8, RQ_DB_U32_8_QPN_M, RQ_DB_U32_8_QPN_S, qpn);
+	roce_set_field(rq_db.u32_8, RQ_DB_U32_8_CMD_M, RQ_DB_U32_8_CMD_S, 1);
+	roce_set_bit(rq_db.u32_8, RQ_DB_U32_8_HW_SYNC_S, 1);
+
+	hns_roce_write64((uint32_t *)&rq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_update_sq_head(struct hns_roce_context *ctx,
+				    unsigned int qpn, unsigned int port,
+				    unsigned int sl, unsigned int sq_head)
+{
+	struct hns_roce_sq_db sq_db;
+
+	sq_db.u32_4 = 0;
+	sq_db.u32_8 = 0;
+
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SQ_HEAD_M,
+		       SQ_DB_U32_4_SQ_HEAD_S, sq_head);
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_PORT_M, SQ_DB_U32_4_PORT_S,
+		       port);
+	roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SL_M, SQ_DB_U32_4_SL_S, sl);
+	roce_set_field(sq_db.u32_8, SQ_DB_U32_8_QPN_M, SQ_DB_U32_8_QPN_S, qpn);
+	roce_set_bit(sq_db.u32_8, SQ_DB_U32_8_HW_SYNC, 1);
+
+	hns_roce_write64((uint32_t *)&sq_db, ctx, ROCEE_DB_SQ_L_0_REG);
+}
+
 static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
 					  struct hns_roce_cq *cq)
 {
@@ -126,6 +179,16 @@ static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
 	return get_sw_cqe(cq, cq->cons_index);
 }
 
+static void *get_recv_wqe(struct hns_roce_qp *qp, int n)
+{
+	if ((n < 0) || (n > qp->rq.wqe_cnt)) {
+		printf("rq wqe index:%d,rq wqe cnt:%d\r\n", n, qp->rq.wqe_cnt);
+		return NULL;
+	}
+
+	return qp->buf.buf + qp->rq.offset + (n << qp->rq.wqe_shift);
+}
+
 static void *get_send_wqe(struct hns_roce_qp *qp, int n)
 {
 	if ((n < 0) || (n > qp->sq.wqe_cnt)) {
@@ -136,6 +199,26 @@ static void *get_send_wqe(struct hns_roce_qp *qp, int n)
 	return (void *)(qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift));
 }
 
+static int hns_roce_wq_overflow(struct hns_roce_wq *wq, int nreq,
+				struct hns_roce_cq *cq)
+{
+	unsigned int cur;
+
+	cur = wq->head - wq->tail;
+	if (cur + nreq < wq->max_post)
+		return 0;
+
+	/* While the num of wqe exceeds cap of the device, cq will be locked */
+	pthread_spin_lock(&cq->lock);
+	cur = wq->head - wq->tail;
+	pthread_spin_unlock(&cq->lock);
+
+	printf("wq:(head = %d, tail = %d, max_post = %d), nreq = 0x%x\n",
+		wq->head, wq->tail, wq->max_post, nreq);
+
+	return cur + nreq >= wq->max_post;
+}
+
 static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
 					    uint32_t qpn)
 {
@@ -372,6 +455,144 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
 	return 0;
 }
 
+static int hns_roce_u_v1_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+				   struct ibv_send_wr **bad_wr)
+{
+	unsigned int ind;
+	void *wqe;
+	int nreq;
+	int ps_opcode, i;
+	int ret = 0;
+	struct hns_roce_wqe_ctrl_seg *ctrl = NULL;
+	struct hns_roce_wqe_data_seg *dseg = NULL;
+	struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+	pthread_spin_lock(&qp->sq.lock);
+
+	/* check that state is OK to post send */
+	ind = qp->sq.head;
+
+	for (nreq = 0; wr; ++nreq, wr = wr->next) {
+		if (hns_roce_wq_overflow(&qp->sq, nreq,
+					 to_hr_cq(qp->ibv_qp.send_cq))) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+		if (wr->num_sge > qp->sq.max_gs) {
+			ret = -1;
+			*bad_wr = wr;
+			printf("wr->num_sge(<=%d) = %d, check failed!\r\n",
+				qp->sq.max_gs, wr->num_sge);
+			goto out;
+		}
+
+		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+		memset(ctrl, 0, sizeof(struct hns_roce_wqe_ctrl_seg));
+
+		qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
+		for (i = 0; i < wr->num_sge; i++)
+			ctrl->msg_length += wr->sg_list[i].length;
+
+
+		ctrl->flag |= ((wr->send_flags & IBV_SEND_SIGNALED) ?
+				HNS_ROCE_WQE_CQ_NOTIFY : 0) |
+			      (wr->send_flags & IBV_SEND_SOLICITED ?
+				HNS_ROCE_WQE_SE : 0) |
+			      ((wr->opcode == IBV_WR_SEND_WITH_IMM ||
+			       wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ?
+				HNS_ROCE_WQE_IMM : 0) |
+			      (wr->send_flags & IBV_SEND_FENCE ?
+				HNS_ROCE_WQE_FENCE : 0);
+
+		if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
+		    wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
+			ctrl->imm_data = wr->imm_data;
+
+		wqe += sizeof(struct hns_roce_wqe_ctrl_seg);
+
+		/* set remote addr segment */
+		switch (ibvqp->qp_type) {
+		case IBV_QPT_RC:
+			switch (wr->opcode) {
+			case IBV_WR_RDMA_READ:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_READ;
+				set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+					      wr->wr.rdma.rkey);
+				break;
+			case IBV_WR_RDMA_WRITE:
+			case IBV_WR_RDMA_WRITE_WITH_IMM:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_WRITE;
+				set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+					      wr->wr.rdma.rkey);
+				break;
+			case IBV_WR_SEND:
+			case IBV_WR_SEND_WITH_IMM:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_SEND;
+				break;
+			case IBV_WR_ATOMIC_CMP_AND_SWP:
+			case IBV_WR_ATOMIC_FETCH_AND_ADD:
+			default:
+				ps_opcode = HNS_ROCE_WQE_OPCODE_MASK;
+				break;
+			}
+			ctrl->flag |= (ps_opcode);
+			wqe  += sizeof(struct hns_roce_wqe_raddr_seg);
+			break;
+		case IBV_QPT_UC:
+		case IBV_QPT_UD:
+		default:
+			break;
+		}
+
+		dseg = wqe;
+
+		/* Inline */
+		if (wr->send_flags & IBV_SEND_INLINE && wr->num_sge) {
+			if (ctrl->msg_length > qp->max_inline_data) {
+				ret = -1;
+				*bad_wr = wr;
+				printf("inline data len(1-32)=%d, send_flags = 0x%x, check failed!\r\n",
+					wr->send_flags, ctrl->msg_length);
+				return ret;
+			}
+
+			for (i = 0; i < wr->num_sge; i++) {
+				memcpy(wqe,
+				     ((void *) (uintptr_t) wr->sg_list[i].addr),
+				     wr->sg_list[i].length);
+				wqe = wqe + wr->sg_list[i].length;
+			}
+
+			ctrl->flag |= HNS_ROCE_WQE_INLINE;
+		} else {
+			/* set sge */
+			for (i = 0; i < wr->num_sge; i++)
+				set_data_seg(dseg+i, wr->sg_list + i);
+
+			ctrl->flag |= wr->num_sge << HNS_ROCE_WQE_SGE_NUM_BIT;
+		}
+
+		ind++;
+	}
+
+out:
+	/* Set DB return */
+	if (likely(nreq)) {
+		qp->sq.head += nreq;
+		wmb();
+
+		hns_roce_update_sq_head(ctx, qp->ibv_qp.qp_num,
+				qp->port_num - 1, qp->sl,
+				qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
+	}
+
+	pthread_spin_unlock(&qp->sq.lock);
+
+	return ret;
+}
+
 static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
 				   struct hns_roce_srq *srq)
 {
@@ -515,9 +736,102 @@ static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
 	return ret;
 }
 
+static int hns_roce_u_v1_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+				   struct ibv_recv_wr **bad_wr)
+{
+	int ret = 0;
+	int nreq;
+	int ind;
+	struct ibv_sge *sg;
+	struct hns_roce_rc_rq_wqe *rq_wqe;
+	struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+	pthread_spin_lock(&qp->rq.lock);
+
+	/* check that state is OK to post receive */
+	ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
+
+	for (nreq = 0; wr; ++nreq, wr = wr->next) {
+		if (hns_roce_wq_overflow(&qp->rq, nreq,
+					 to_hr_cq(qp->ibv_qp.recv_cq))) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		if (wr->num_sge > qp->rq.max_gs) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		rq_wqe = get_recv_wqe(qp, ind);
+		if (wr->num_sge > HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+			ret = -1;
+			*bad_wr = wr;
+			goto out;
+		}
+
+		if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM);
+			sg = wr->sg_list;
+
+			rq_wqe->va0 = (sg->addr);
+			rq_wqe->l_key0 = (sg->lkey);
+			rq_wqe->length0 = (sg->length);
+
+			sg = wr->sg_list + 1;
+
+			rq_wqe->va1 = (sg->addr);
+			rq_wqe->l_key1 = (sg->lkey);
+			rq_wqe->length1 = (sg->length);
+		} else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1);
+			sg = wr->sg_list;
+
+			rq_wqe->va0 = (sg->addr);
+			rq_wqe->l_key0 = (sg->lkey);
+			rq_wqe->length0 = (sg->length);
+
+		} else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2) {
+			roce_set_field(rq_wqe->u32_2,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+				       RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+				       HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2);
+		}
+
+		qp->rq.wrid[ind] = wr->wr_id;
+
+		ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+	}
+
+out:
+	if (nreq) {
+		qp->rq.head += nreq;
+
+		wmb();
+
+		hns_roce_update_rq_head(ctx, qp->ibv_qp.qp_num,
+				    qp->rq.head & ((qp->rq.wqe_cnt << 1) - 1));
+	}
+
+	pthread_spin_unlock(&qp->rq.lock);
+
+	return ret;
+}
+
 struct hns_roce_u_hw hns_roce_u_hw_v1 = {
 	.poll_cq = hns_roce_u_v1_poll_cq,
 	.arm_cq = hns_roce_u_v1_arm_cq,
+	.post_send = hns_roce_u_v1_post_send,
+	.post_recv = hns_roce_u_v1_post_recv,
 	.modify_qp = hns_roce_u_v1_modify_qp,
 	.destroy_qp = hns_roce_u_v1_destroy_qp,
 };
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
index b249f54..128c66f 100644
--- a/providers/hns/hns_roce_u_hw_v1.h
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -39,9 +39,15 @@
 #define HNS_ROCE_CQE_IS_SQ			0
 
 #define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN	32
+#define HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM		2
 
 enum {
+	HNS_ROCE_WQE_INLINE		= 1 << 31,
+	HNS_ROCE_WQE_SE			= 1 << 30,
+	HNS_ROCE_WQE_SGE_NUM_BIT	= 24,
 	HNS_ROCE_WQE_IMM		= 1 << 23,
+	HNS_ROCE_WQE_FENCE		= 1 << 21,
+	HNS_ROCE_WQE_CQ_NOTIFY		 = 1 << 20,
 	HNS_ROCE_WQE_OPCODE_SEND        = 0 << 16,
 	HNS_ROCE_WQE_OPCODE_RDMA_READ   = 1 << 16,
 	HNS_ROCE_WQE_OPCODE_RDMA_WRITE  = 2 << 16,
@@ -52,6 +58,20 @@ enum {
 struct hns_roce_wqe_ctrl_seg {
 	__be32		sgl_pa_h;
 	__be32		flag;
+	__be32		imm_data;
+	__be32		msg_length;
+};
+
+struct hns_roce_wqe_data_seg {
+	__be64		addr;
+	__be32		lkey;
+	__be32		len;
+};
+
+struct hns_roce_wqe_raddr_seg {
+	__be32		rkey;
+	__be32		len;
+	__be64		raddr;
 };
 
 enum {
@@ -102,6 +122,43 @@ struct hns_roce_cq_db {
 
 #define CQ_DB_U32_8_HW_SYNC_S 31
 
+struct hns_roce_rq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+
+#define RQ_DB_U32_4_RQ_HEAD_S 0
+#define RQ_DB_U32_4_RQ_HEAD_M   (((1UL << 15) - 1) << RQ_DB_U32_4_RQ_HEAD_S)
+
+#define RQ_DB_U32_8_QPN_S 0
+#define RQ_DB_U32_8_QPN_M   (((1UL << 24) - 1) << RQ_DB_U32_8_QPN_S)
+
+#define RQ_DB_U32_8_CMD_S 28
+#define RQ_DB_U32_8_CMD_M   (((1UL << 3) - 1) << RQ_DB_U32_8_CMD_S)
+
+#define RQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_sq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+
+#define SQ_DB_U32_4_SQ_HEAD_S 0
+#define SQ_DB_U32_4_SQ_HEAD_M (((1UL << 15) - 1) << SQ_DB_U32_4_SQ_HEAD_S)
+
+#define SQ_DB_U32_4_SL_S 16
+#define SQ_DB_U32_4_SL_M (((1UL << 2) - 1) << SQ_DB_U32_4_SL_S)
+
+#define SQ_DB_U32_4_PORT_S 18
+#define SQ_DB_U32_4_PORT_M (((1UL << 3) - 1) << SQ_DB_U32_4_PORT_S)
+
+#define SQ_DB_U32_4_DIRECT_WQE_S 31
+
+#define SQ_DB_U32_8_QPN_S 0
+#define SQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << SQ_DB_U32_8_QPN_S)
+
+#define SQ_DB_U32_8_HW_SYNC 31
+
 struct hns_roce_cqe {
 	unsigned int cqe_byte_4;
 	union {
@@ -160,4 +217,26 @@ struct hns_roce_rc_send_wqe {
 	unsigned int length1;
 };
 
+struct hns_roce_rc_rq_wqe {
+	unsigned int u32_0;
+	unsigned int sgl_ba_31_0;
+	unsigned int u32_2;
+	unsigned int rvd_5;
+	unsigned int rvd_6;
+	unsigned int rvd_7;
+	unsigned int rvd_8;
+	unsigned int rvd_9;
+
+	uint64_t     va0;
+	unsigned int l_key0;
+	unsigned int length0;
+
+	uint64_t     va1;
+	unsigned int l_key1;
+	unsigned int length1;
+};
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_S 16
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_M \
+	(((1UL << 6) - 1) << RC_RQ_WQE_NUMBER_OF_DATA_SEG_S)
+
 #endif /* _HNS_ROCE_U_HW_V1_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 5/7] libhns: Add verbs of qp support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the relatived qp verbs for userspace
library of hns, include:
    1. create_qp
    2. query_qp
    3. modify_qp
    4. destroy_qp

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3:
- No change over v2

v2:
- Delete the min() and use the ccan header

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |   5 +
 providers/hns/hns_roce_u.h       |  45 +++++++
 providers/hns/hns_roce_u_abi.h   |   8 ++
 providers/hns/hns_roce_u_hw_v1.c | 155 +++++++++++++++++++++++
 providers/hns/hns_roce_u_verbs.c | 259 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 471 insertions(+), 1 deletion(-)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 1877218..de2fd57 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -127,6 +127,11 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.cq_event	   = hns_roce_u_cq_event;
 	context->ibv_ctx.ops.destroy_cq    = hns_roce_u_destroy_cq;
 
+	context->ibv_ctx.ops.create_qp     = hns_roce_u_create_qp;
+	context->ibv_ctx.ops.query_qp	   = hns_roce_u_query_qp;
+	context->ibv_ctx.ops.modify_qp     = hr_dev->u_hw->modify_qp;
+	context->ibv_ctx.ops.destroy_qp    = hr_dev->u_hw->destroy_qp;
+
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
 
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index c3e364d..02b9251 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -44,6 +44,7 @@
 
 #define HNS_ROCE_MAX_CQ_NUM		0x10000
 #define HNS_ROCE_MIN_CQE_NUM		0x40
+#define HNS_ROCE_MIN_WQE_NUM		0x20
 #define HNS_ROCE_CQ_DB_BUF_SIZE		((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
 #define HNS_ROCE_TPTR_OFFSET		0x1000
 #define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
@@ -128,10 +129,29 @@ struct hns_roce_cq {
 	int				arm_sn;
 };
 
+struct hns_roce_srq {
+	struct ibv_srq			ibv_srq;
+	struct hns_roce_buf		buf;
+	pthread_spinlock_t		lock;
+	unsigned long			*wrid;
+	unsigned int			srqn;
+	int				max;
+	unsigned int			max_gs;
+	int				wqe_shift;
+	int				head;
+	int				tail;
+	unsigned int			*db;
+	unsigned short			counter;
+};
+
 struct hns_roce_wq {
 	unsigned long			*wrid;
+	pthread_spinlock_t		lock;
 	unsigned int			wqe_cnt;
+	int				max_post;
+	unsigned int			head;
 	unsigned int			tail;
+	unsigned int			max_gs;
 	int				wqe_shift;
 	int				offset;
 };
@@ -139,14 +159,21 @@ struct hns_roce_wq {
 struct hns_roce_qp {
 	struct ibv_qp			ibv_qp;
 	struct hns_roce_buf		buf;
+	int				max_inline_data;
+	int				buf_size;
 	unsigned int			sq_signal_bits;
 	struct hns_roce_wq		sq;
 	struct hns_roce_wq		rq;
+	int				port_num;
+	int				sl;
 };
 
 struct hns_roce_u_hw {
 	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
 	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	int (*destroy_qp)(struct ibv_qp *ibqp);
 };
 
 static inline unsigned long align(unsigned long val, unsigned long align)
@@ -174,6 +201,16 @@ static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
 	return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
 }
 
+static inline struct hns_roce_srq *to_hr_srq(struct ibv_srq *ibv_srq)
+{
+	return container_of(ibv_srq, struct hns_roce_srq, ibv_srq);
+}
+
+static inline struct  hns_roce_qp *to_hr_qp(struct ibv_qp *ibv_qp)
+{
+	return container_of(ibv_qp, struct hns_roce_qp, ibv_qp);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -193,10 +230,18 @@ struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
 int hns_roce_u_destroy_cq(struct ibv_cq *cq);
 void hns_roce_u_cq_event(struct ibv_cq *cq);
 
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *attr);
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+			int attr_mask, struct ibv_qp_init_attr *init_attr);
+
 int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
 		       int page_size);
 void hns_roce_free_buf(struct hns_roce_buf *buf);
 
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp);
+
 extern struct hns_roce_u_hw hns_roce_u_hw_v1;
 
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 1e62a7e..e78f967 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -58,4 +58,12 @@ struct hns_roce_create_cq_resp {
 	__u32				reserved;
 };
 
+struct hns_roce_create_qp {
+	struct ibv_create_qp		ibv_cmd;
+	__u64				buf_addr;
+	__u8				log_sq_bb_count;
+	__u8				log_sq_stride;
+	__u8				reserved[5];
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index 39a67b1..e5c7f6a 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -149,6 +149,16 @@ static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
 	}
 }
 
+static void hns_roce_clear_qp(struct hns_roce_context *ctx, uint32_t qpn)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (!--ctx->qp_table[tind].refcnt)
+		free(ctx->qp_table[tind].table);
+	else
+		ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL;
+}
+
 static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
 				struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
 {
@@ -362,7 +372,152 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
 	return 0;
 }
 
+static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
+				   struct hns_roce_srq *srq)
+{
+	int nfreed = 0;
+	uint32_t prod_index;
+	uint8_t owner_bit = 0;
+	struct hns_roce_cqe *cqe, *dest;
+	struct hns_roce_context *ctx = to_hr_ctx(cq->ibv_cq.context);
+
+	for (prod_index = cq->cons_index; get_sw_cqe(cq, prod_index);
+	     ++prod_index)
+		if (prod_index == cq->cons_index + cq->ibv_cq.cqe)
+			break;
+
+	while ((int) --prod_index - (int) cq->cons_index >= 0) {
+		cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
+		if ((roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+			      CQE_BYTE_16_LOCAL_QPN_S) & 0xffffff) == qpn) {
+			++nfreed;
+		} else if (nfreed) {
+			dest = get_cqe(cq,
+				       (prod_index + nfreed) & cq->ibv_cq.cqe);
+			owner_bit = roce_get_bit(dest->cqe_byte_4,
+						 CQE_BYTE_4_OWNER_S);
+			memcpy(dest, cqe, sizeof(*cqe));
+			roce_set_bit(dest->cqe_byte_4, CQE_BYTE_4_OWNER_S,
+				     owner_bit);
+		}
+	}
+
+	if (nfreed) {
+		cq->cons_index += nfreed;
+		wmb();
+		hns_roce_update_cq_cons_index(ctx, cq);
+	}
+}
+
+static void hns_roce_v1_cq_clean(struct hns_roce_cq *cq, unsigned int qpn,
+				 struct hns_roce_srq *srq)
+{
+	pthread_spin_lock(&cq->lock);
+	__hns_roce_v1_cq_clean(cq, qpn, srq);
+	pthread_spin_unlock(&cq->lock);
+}
+
+static int hns_roce_u_v1_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+				   int attr_mask)
+{
+	int ret;
+	struct ibv_modify_qp cmd;
+	struct hns_roce_qp *hr_qp = to_hr_qp(qp);
+
+	ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+	if (!ret && (attr_mask & IBV_QP_STATE) &&
+	    attr->qp_state == IBV_QPS_RESET) {
+		hns_roce_v1_cq_clean(to_hr_cq(qp->recv_cq), qp->qp_num,
+				     qp->srq ? to_hr_srq(qp->srq) : NULL);
+		if (qp->send_cq != qp->recv_cq)
+			hns_roce_v1_cq_clean(to_hr_cq(qp->send_cq), qp->qp_num,
+					     NULL);
+
+		hns_roce_init_qp_indices(to_hr_qp(qp));
+	}
+
+	if (!ret && (attr_mask & IBV_QP_PORT)) {
+		hr_qp->port_num = attr->port_num;
+		printf("hr_qp->port_num= 0x%x\n", hr_qp->port_num);
+	}
+
+	hr_qp->sl = attr->ah_attr.sl;
+
+	return ret;
+}
+
+static void hns_roce_lock_cqs(struct ibv_qp *qp)
+{
+	struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+	struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+	if (send_cq == recv_cq) {
+		pthread_spin_lock(&send_cq->lock);
+	} else if (send_cq->cqn < recv_cq->cqn) {
+		pthread_spin_lock(&send_cq->lock);
+		pthread_spin_lock(&recv_cq->lock);
+	} else {
+		pthread_spin_lock(&recv_cq->lock);
+		pthread_spin_lock(&send_cq->lock);
+	}
+}
+
+static void hns_roce_unlock_cqs(struct ibv_qp *qp)
+{
+	struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+	struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+	if (send_cq == recv_cq) {
+		pthread_spin_unlock(&send_cq->lock);
+	} else if (send_cq->cqn < recv_cq->cqn) {
+		pthread_spin_unlock(&recv_cq->lock);
+		pthread_spin_unlock(&send_cq->lock);
+	} else {
+		pthread_spin_unlock(&send_cq->lock);
+		pthread_spin_unlock(&recv_cq->lock);
+	}
+}
+
+static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
+{
+	int ret;
+	struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+	pthread_mutex_lock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+	ret = ibv_cmd_destroy_qp(ibqp);
+	if (ret) {
+		pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+		return ret;
+	}
+
+	hns_roce_lock_cqs(ibqp);
+
+	__hns_roce_v1_cq_clean(to_hr_cq(ibqp->recv_cq), ibqp->qp_num,
+			       ibqp->srq ? to_hr_srq(ibqp->srq) : NULL);
+
+	if (ibqp->send_cq != ibqp->recv_cq)
+		__hns_roce_v1_cq_clean(to_hr_cq(ibqp->send_cq), ibqp->qp_num,
+				       NULL);
+
+	hns_roce_clear_qp(to_hr_ctx(ibqp->context), ibqp->qp_num);
+
+	hns_roce_unlock_cqs(ibqp);
+	pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+
+	free(qp->sq.wrid);
+	if (qp->rq.wqe_cnt)
+		free(qp->rq.wrid);
+
+	hns_roce_free_buf(&qp->buf);
+	free(qp);
+
+	return ret;
+}
+
 struct hns_roce_u_hw hns_roce_u_hw_v1 = {
 	.poll_cq = hns_roce_u_v1_poll_cq,
 	.arm_cq = hns_roce_u_v1_arm_cq,
+	.modify_qp = hns_roce_u_v1_modify_qp,
+	.destroy_qp = hns_roce_u_v1_destroy_qp,
 };
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index c9324dd..0b8f444 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -38,11 +38,19 @@
 #include <sys/mman.h>
 #include <fcntl.h>
 #include <unistd.h>
-
+#include <ccan/minmax.h>
 #include "hns_roce_u.h"
 #include "hns_roce_u_abi.h"
 #include "hns_roce_u_hw_v1.h"
 
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp)
+{
+	qp->sq.head = 0;
+	qp->sq.tail = 0;
+	qp->rq.head = 0;
+	qp->rq.tail = 0;
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr)
 {
@@ -163,6 +171,29 @@ static int align_cq_size(int req)
 	return nent;
 }
 
+static int align_qp_size(int req)
+{
+	int nent;
+
+	for (nent = HNS_ROCE_MIN_WQE_NUM; nent < req; nent <<= 1)
+		;
+
+	return nent;
+}
+
+static void hns_roce_set_sq_sizes(struct hns_roce_qp *qp,
+				  struct ibv_qp_cap *cap, enum ibv_qp_type type)
+{
+	struct hns_roce_context *ctx = to_hr_ctx(qp->ibv_qp.context);
+
+	qp->sq.max_gs = 2;
+	cap->max_send_sge = min(ctx->max_sge, qp->sq.max_gs);
+	qp->sq.max_post = min(ctx->max_qp_wr, qp->sq.wqe_cnt);
+	cap->max_send_wr = qp->sq.max_post;
+	qp->max_inline_data  = 32;
+	cap->max_inline_data = qp->max_inline_data;
+}
+
 static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
 {
 	if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
@@ -189,6 +220,17 @@ static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
 	return 0;
 }
 
+static void hns_roce_calc_sq_wqe_size(struct ibv_qp_cap *cap,
+				      enum ibv_qp_type type,
+				      struct hns_roce_qp *qp)
+{
+	int size = sizeof(struct hns_roce_rc_send_wqe);
+
+	for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
+	     qp->sq.wqe_shift++)
+		;
+}
+
 struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
 				    struct ibv_comp_channel *channel,
 				    int comp_vector)
@@ -266,3 +308,218 @@ int hns_roce_u_destroy_cq(struct ibv_cq *cq)
 
 	return ret;
 }
+
+static int hns_roce_verify_qp(struct ibv_qp_init_attr *attr,
+			      struct hns_roce_context *context)
+{
+	if (attr->cap.max_send_wr < HNS_ROCE_MIN_WQE_NUM) {
+		fprintf(stderr,
+			"max_send_wr = %d, less than minimum WQE number.\n",
+			attr->cap.max_send_wr);
+		attr->cap.max_send_wr = HNS_ROCE_MIN_WQE_NUM;
+	}
+
+	if (attr->cap.max_recv_wr < HNS_ROCE_MIN_WQE_NUM) {
+		fprintf(stderr,
+			"max_recv_wr = %d, less than minimum WQE number.\n",
+			attr->cap.max_recv_wr);
+		attr->cap.max_recv_wr = HNS_ROCE_MIN_WQE_NUM;
+	}
+
+	if (attr->cap.max_recv_sge < 1)
+		attr->cap.max_recv_sge = 1;
+	if (attr->cap.max_send_wr > context->max_qp_wr ||
+	    attr->cap.max_recv_wr > context->max_qp_wr ||
+	    attr->cap.max_send_sge > context->max_sge  ||
+	    attr->cap.max_recv_sge > context->max_sge)
+		return -1;
+
+	if ((attr->qp_type != IBV_QPT_RC) && (attr->qp_type != IBV_QPT_UD))
+		return -1;
+
+	if ((attr->qp_type == IBV_QPT_RC) &&
+	    (attr->cap.max_inline_data > HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN))
+		return -1;
+
+	if (attr->qp_type == IBV_QPT_UC)
+		return -1;
+
+	return 0;
+}
+
+static int hns_roce_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
+				 enum ibv_qp_type type, struct hns_roce_qp *qp)
+{
+	qp->sq.wrid =
+		(unsigned long *)malloc(qp->sq.wqe_cnt * sizeof(uint64_t));
+	if (!qp->sq.wrid)
+		return -1;
+
+	if (qp->rq.wqe_cnt) {
+		qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+		if (!qp->rq.wrid) {
+			free(qp->sq.wrid);
+			return -1;
+		}
+	}
+
+	for (qp->rq.wqe_shift = 4;
+	     1 << qp->rq.wqe_shift < sizeof(struct hns_roce_rc_send_wqe);
+	     qp->rq.wqe_shift++)
+		;
+
+	qp->buf_size = align((qp->sq.wqe_cnt << qp->sq.wqe_shift), 0x1000) +
+		      (qp->rq.wqe_cnt << qp->rq.wqe_shift);
+
+	if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
+		qp->rq.offset = 0;
+		qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
+	} else {
+		qp->rq.offset = align((qp->sq.wqe_cnt << qp->sq.wqe_shift),
+				       0x1000);
+		qp->sq.offset = 0;
+	}
+
+	if (hns_roce_alloc_buf(&qp->buf, align(qp->buf_size, 0x1000),
+			       to_hr_dev(pd->context->device)->page_size)) {
+		free(qp->sq.wrid);
+		free(qp->rq.wrid);
+		return -1;
+	}
+
+	memset(qp->buf.buf, 0, qp->buf_size);
+
+	return 0;
+}
+
+static int hns_roce_store_qp(struct hns_roce_context *ctx, uint32_t qpn,
+			     struct hns_roce_qp *qp)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (!ctx->qp_table[tind].refcnt) {
+		ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1,
+						  sizeof(struct hns_roce_qp *));
+		if (!ctx->qp_table[tind].table)
+			return -1;
+	}
+
+	++ctx->qp_table[tind].refcnt;
+	ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp;
+
+	return 0;
+}
+
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *attr)
+{
+	int ret;
+	struct hns_roce_qp *qp = NULL;
+	struct hns_roce_create_qp cmd;
+	struct ibv_create_qp_resp resp;
+	struct hns_roce_context *context = to_hr_ctx(pd->context);
+
+	if (hns_roce_verify_qp(attr, context)) {
+		fprintf(stderr, "hns_roce_verify_sizes failed!\n");
+		return NULL;
+	}
+
+	qp = malloc(sizeof(*qp));
+	if (!qp) {
+		fprintf(stderr, "malloc failed!\n");
+		return NULL;
+	}
+
+	hns_roce_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
+	qp->sq.wqe_cnt = align_qp_size(attr->cap.max_send_wr);
+	qp->rq.wqe_cnt = align_qp_size(attr->cap.max_recv_wr);
+
+	if (hns_roce_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) {
+		fprintf(stderr, "hns_roce_alloc_qp_buf failed!\n");
+		goto err;
+	}
+
+	hns_roce_init_qp_indices(qp);
+
+	if (pthread_spin_init(&qp->sq.lock, PTHREAD_PROCESS_PRIVATE) ||
+	    pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) {
+		fprintf(stderr, "pthread_spin_init failed!\n");
+		goto err_free;
+	}
+
+	cmd.buf_addr = (uintptr_t) qp->buf.buf;
+	cmd.log_sq_stride = qp->sq.wqe_shift;
+	for (cmd.log_sq_bb_count = 0; qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count;
+	     ++cmd.log_sq_bb_count)
+		;
+
+	memset(cmd.reserved, 0, sizeof(cmd.reserved));
+
+	pthread_mutex_lock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+	ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd,
+				sizeof(cmd), &resp, sizeof(resp));
+	if (ret) {
+		fprintf(stderr, "ibv_cmd_create_qp failed!\n");
+		goto err_rq_db;
+	}
+
+	ret = hns_roce_store_qp(to_hr_ctx(pd->context), qp->ibv_qp.qp_num, qp);
+	if (ret) {
+		fprintf(stderr, "hns_roce_store_qp failed!\n");
+		goto err_destroy;
+	}
+	pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+	qp->rq.wqe_cnt = attr->cap.max_recv_wr;
+	qp->rq.max_gs	= attr->cap.max_recv_sge;
+
+	/* adjust rq maxima to not exceed reported device maxima */
+	attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
+	attr->cap.max_recv_sge = min(context->max_sge, attr->cap.max_recv_sge);
+
+	qp->rq.max_post = attr->cap.max_recv_wr;
+	hns_roce_set_sq_sizes(qp, &attr->cap, attr->qp_type);
+
+	qp->sq_signal_bits = attr->sq_sig_all ? 0 : 1;
+
+	return &qp->ibv_qp;
+
+err_destroy:
+	ibv_cmd_destroy_qp(&qp->ibv_qp);
+
+err_rq_db:
+	pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+err_free:
+	free(qp->sq.wrid);
+	if (qp->rq.wqe_cnt)
+		free(qp->rq.wrid);
+	hns_roce_free_buf(&qp->buf);
+
+err:
+	free(qp);
+
+	return NULL;
+}
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+			int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+	int ret;
+	struct ibv_query_qp cmd;
+	struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+	ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd,
+			       sizeof(cmd));
+	if (ret)
+		return ret;
+
+	init_attr->cap.max_send_wr = qp->sq.max_post;
+	init_attr->cap.max_send_sge = qp->sq.max_gs;
+	init_attr->cap.max_inline_data = qp->max_inline_data;
+
+	attr->cap = init_attr->cap;
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 4/7] libhns: Add verbs of cq support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the relatived cq verbs for userspace
of hns, include:
    1. create_cq
    2. poll_cq
    3. req_notify_cq
    4. cq_event
    5. destroy_cq

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5:
- Eliminate the warning when CFLAGS equal to -m32

v4:
- Eliminate the warning by Travis CI testing

v3:
- No change over the v2

v2:
- Delete the unused code

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  57 +++++-
 providers/hns/hns_roce_u.h       |  94 ++++++++++
 providers/hns/hns_roce_u_abi.h   |  12 ++
 providers/hns/hns_roce_u_buf.c   |  61 +++++++
 providers/hns/hns_roce_u_db.h    |  54 ++++++
 providers/hns/hns_roce_u_hw_v1.c | 368 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h | 163 +++++++++++++++++
 providers/hns/hns_roce_u_verbs.c | 116 ++++++++++++
 8 files changed, 920 insertions(+), 5 deletions(-)
 create mode 100644 providers/hns/hns_roce_u_buf.c
 create mode 100644 providers/hns/hns_roce_u_db.h
 create mode 100644 providers/hns/hns_roce_u_hw_v1.c
 create mode 100644 providers/hns/hns_roce_u_hw_v1.h

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 53e2720..1877218 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -46,15 +46,19 @@
 
 static const struct {
 	char	 hid[HID_LEN];
+	void	 *data;
+	int	 version;
 } acpi_table[] = {
-	{"acpi:HISI00D1:"},
-	{},
+	 {"acpi:HISI00D1:", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
+	 {},
 };
 
 static const struct {
 	char	 compatible[DEV_MATCH_LEN];
+	void	 *data;
+	int	 version;
 } dt_table[] = {
-	{"hisilicon,hns-roce-v1"},
+	{"hisilicon,hns-roce-v1", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
 	{},
 };
 
@@ -93,6 +97,21 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 		goto err_free;
 	}
 
+	if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+		/*
+		 * when vma->vm_pgoff is 1, the cq_tptr_base includes 64K CQ,
+		 * a pointer of CQ need 2B size
+		 */
+		context->cq_tptr_base = mmap(NULL, HNS_ROCE_CQ_DB_BUF_SIZE,
+					     PROT_READ | PROT_WRITE, MAP_SHARED,
+					     cmd_fd, HNS_ROCE_TPTR_OFFSET);
+		if (context->cq_tptr_base == MAP_FAILED) {
+			fprintf(stderr,
+				PFX "Warning: Failed to mmap cq_tptr page.\n");
+			goto db_free;
+		}
+	}
+
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 
 	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
@@ -102,6 +121,12 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	context->ibv_ctx.ops.reg_mr	   = hns_roce_u_reg_mr;
 	context->ibv_ctx.ops.dereg_mr	   = hns_roce_u_dereg_mr;
 
+	context->ibv_ctx.ops.create_cq     = hns_roce_u_create_cq;
+	context->ibv_ctx.ops.poll_cq	   = hr_dev->u_hw->poll_cq;
+	context->ibv_ctx.ops.req_notify_cq = hr_dev->u_hw->arm_cq;
+	context->ibv_ctx.ops.cq_event	   = hns_roce_u_cq_event;
+	context->ibv_ctx.ops.destroy_cq    = hns_roce_u_destroy_cq;
+
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
 
@@ -112,6 +137,16 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 	return &context->ibv_ctx;
 
 tptr_free:
+	if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+		if (munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE))
+			fprintf(stderr, PFX "Warning: Munmap tptr failed.\n");
+		context->cq_tptr_base = NULL;
+	}
+
+db_free:
+	munmap(context->uar, to_hr_dev(ibdev)->page_size);
+	context->uar = NULL;
+
 err_free:
 	free(context);
 	return NULL;
@@ -122,6 +157,8 @@ static void hns_roce_free_context(struct ibv_context *ibctx)
 	struct hns_roce_context *context = to_hr_ctx(ibctx);
 
 	munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+	if (to_hr_dev(ibctx->device)->hw_version == HNS_ROCE_HW_VER1)
+		munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE);
 
 	context->uar = NULL;
 
@@ -140,18 +177,26 @@ static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
 	struct hns_roce_device  *dev;
 	char			 value[128];
 	int			 i;
+	void			 *u_hw;
+	int			 hw_version;
 
 	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
 				value, sizeof(value)) > 0)
 		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
-			if (!strcmp(value, acpi_table[i].hid))
+			if (!strcmp(value, acpi_table[i].hid)) {
+				u_hw = acpi_table[i].data;
+				hw_version = acpi_table[i].version;
 				goto found;
+			}
 
 	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
 				value, sizeof(value)) > 0)
 		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
-			if (!strcmp(value, dt_table[i].compatible))
+			if (!strcmp(value, dt_table[i].compatible)) {
+				u_hw = dt_table[i].data;
+				hw_version = dt_table[i].version;
 				goto found;
+			}
 
 	return NULL;
 
@@ -164,6 +209,8 @@ found:
 	}
 
 	dev->ibv_dev.ops = hns_roce_dev_ops;
+	dev->u_hw = (struct hns_roce_u_hw *)u_hw;
+	dev->hw_version = hw_version;
 	dev->page_size   = sysconf(_SC_PAGESIZE);
 	return &dev->ibv_dev;
 }
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 5b73794..c3e364d 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -40,18 +40,53 @@
 #include <infiniband/verbs.h>
 #include <ccan/container_of.h>
 
+#define HNS_ROCE_CQE_ENTRY_SIZE		0x20
+
+#define HNS_ROCE_MAX_CQ_NUM		0x10000
+#define HNS_ROCE_MIN_CQE_NUM		0x40
+#define HNS_ROCE_CQ_DB_BUF_SIZE		((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
+#define HNS_ROCE_TPTR_OFFSET		0x1000
 #define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
 
 #define PFX				"hns: "
 
+#define roce_get_field(origin, mask, shift) \
+	(((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+	roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+	do { \
+		(origin) &= (~(mask)); \
+		(origin) |= (((unsigned int)(val) << (shift)) & (mask)); \
+	} while (0)
+
+#define roce_set_bit(origin, shift, val) \
+	roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
 enum {
 	HNS_ROCE_QP_TABLE_BITS		= 8,
 	HNS_ROCE_QP_TABLE_SIZE		= 1 << HNS_ROCE_QP_TABLE_BITS,
 };
 
+/* operation type list */
+enum {
+	/* rq&srq operation */
+	HNS_ROCE_OPCODE_SEND_DATA_RECEIVE         = 0x06,
+	HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE     = 0x07,
+};
+
 struct hns_roce_device {
 	struct ibv_device		ibv_dev;
 	int				page_size;
+	struct hns_roce_u_hw		*u_hw;
+	int				hw_version;
+};
+
+struct hns_roce_buf {
+	void				*buf;
+	unsigned int			length;
 };
 
 struct hns_roce_context {
@@ -59,7 +94,10 @@ struct hns_roce_context {
 	void				*uar;
 	pthread_spinlock_t		uar_lock;
 
+	void				*cq_tptr_base;
+
 	struct {
+		struct hns_roce_qp	**table;
 		int			refcnt;
 	} qp_table[HNS_ROCE_QP_TABLE_SIZE];
 
@@ -78,6 +116,44 @@ struct hns_roce_pd {
 	unsigned int			pdn;
 };
 
+struct hns_roce_cq {
+	struct ibv_cq			ibv_cq;
+	struct hns_roce_buf		buf;
+	pthread_spinlock_t		lock;
+	unsigned int			cqn;
+	unsigned int			cq_depth;
+	unsigned int			cons_index;
+	unsigned int			*set_ci_db;
+	unsigned int			*arm_db;
+	int				arm_sn;
+};
+
+struct hns_roce_wq {
+	unsigned long			*wrid;
+	unsigned int			wqe_cnt;
+	unsigned int			tail;
+	int				wqe_shift;
+	int				offset;
+};
+
+struct hns_roce_qp {
+	struct ibv_qp			ibv_qp;
+	struct hns_roce_buf		buf;
+	unsigned int			sq_signal_bits;
+	struct hns_roce_wq		sq;
+	struct hns_roce_wq		rq;
+};
+
+struct hns_roce_u_hw {
+	int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
+	int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+};
+
+static inline unsigned long align(unsigned long val, unsigned long align)
+{
+	return (val + align - 1) & ~(align - 1);
+}
+
 static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
 {
 	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -93,6 +169,11 @@ static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
 	return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
 }
 
+static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
+{
+	return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -105,4 +186,17 @@ struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
 				 int access);
 int hns_roce_u_dereg_mr(struct ibv_mr *mr);
 
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq);
+void hns_roce_u_cq_event(struct ibv_cq *cq);
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+		       int page_size);
+void hns_roce_free_buf(struct hns_roce_buf *buf);
+
+extern struct hns_roce_u_hw hns_roce_u_hw_v1;
+
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 0a0cd0c..1e62a7e 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -46,4 +46,16 @@ struct hns_roce_alloc_pd_resp {
 	__u32				reserved;
 };
 
+struct hns_roce_create_cq {
+	struct ibv_create_cq		ibv_cmd;
+	__u64				buf_addr;
+	__u64				db_addr;
+};
+
+struct hns_roce_create_cq_resp {
+	struct ibv_create_cq_resp	ibv_resp;
+	__u32				cqn;
+	__u32				reserved;
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_buf.c b/providers/hns/hns_roce_u_buf.c
new file mode 100644
index 0000000..f92ea65
--- /dev/null
+++ b/providers/hns/hns_roce_u_buf.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <errno.h>
+#include <sys/mman.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+		       int page_size)
+{
+	int ret;
+
+	buf->length = align(size, page_size);
+	buf->buf = mmap(NULL, buf->length, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf->buf == MAP_FAILED)
+		return errno;
+
+	ret = ibv_dontfork_range(buf->buf, size);
+	if (ret)
+		munmap(buf->buf, buf->length);
+
+	return ret;
+}
+
+void hns_roce_free_buf(struct hns_roce_buf *buf)
+{
+	ibv_dofork_range(buf->buf, buf->length);
+
+	munmap(buf->buf, buf->length);
+}
diff --git a/providers/hns/hns_roce_u_db.h b/providers/hns/hns_roce_u_db.h
new file mode 100644
index 0000000..76d13ce
--- /dev/null
+++ b/providers/hns/hns_roce_u_db.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/types.h>
+
+#include "hns_roce_u.h"
+
+#ifndef _HNS_ROCE_U_DB_H
+#define _HNS_ROCE_U_DB_H
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0])
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1])
+#else
+#error __BYTE_ORDER not defined
+#endif
+
+static inline void hns_roce_write64(uint32_t val[2],
+				    struct hns_roce_context *ctx, int offset)
+{
+	*(volatile uint64_t *) (ctx->uar + offset) = HNS_ROCE_PAIR_TO_64(val);
+}
+
+#endif /* _HNS_ROCE_U_DB_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
new file mode 100644
index 0000000..39a67b1
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -0,0 +1,368 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <malloc.h>
+#include "hns_roce_u_db.h"
+#include "hns_roce_u_hw_v1.h"
+#include "hns_roce_u.h"
+
+static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
+					  struct hns_roce_cq *cq)
+{
+	struct hns_roce_cq_db cq_db;
+
+	cq_db.u32_4 = 0;
+	cq_db.u32_8 = 0;
+
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+		       CQ_DB_U32_8_CMD_MDF_S, 0);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+		       cq->cqn);
+	roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+		       CQ_DB_U32_4_CONS_IDX_S,
+		       cq->cons_index & ((cq->cq_depth << 1) - 1));
+
+	hns_roce_write64((uint32_t *)&cq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_handle_error_cqe(struct hns_roce_cqe *cqe,
+				      struct ibv_wc *wc)
+{
+	switch (roce_get_field(cqe->cqe_byte_4,
+			       CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+			       CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) &
+		HNS_ROCE_CQE_STATUS_MASK) {
+		fprintf(stderr, PFX "error cqe!\n");
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR:
+		wc->status = IBV_WC_LOC_LEN_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR:
+		wc->status = IBV_WC_LOC_QP_OP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR:
+		wc->status = IBV_WC_LOC_PROT_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR:
+		wc->status = IBV_WC_WR_FLUSH_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR:
+		wc->status = IBV_WC_MW_BIND_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR:
+		wc->status = IBV_WC_BAD_RESP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+		wc->status = IBV_WC_LOC_ACCESS_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
+		wc->status = IBV_WC_REM_INV_REQ_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+		wc->status = IBV_WC_REM_ACCESS_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR:
+		wc->status = IBV_WC_REM_OP_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
+		wc->status = IBV_WC_RETRY_EXC_ERR;
+		break;
+	case HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
+		wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+		break;
+	default:
+		wc->status = IBV_WC_GENERAL_ERR;
+		break;
+	}
+}
+
+static struct hns_roce_cqe *get_cqe(struct hns_roce_cq *cq, int entry)
+{
+	return cq->buf.buf + entry * HNS_ROCE_CQE_ENTRY_SIZE;
+}
+
+static void *get_sw_cqe(struct hns_roce_cq *cq, int n)
+{
+	struct hns_roce_cqe *cqe = get_cqe(cq, n & cq->ibv_cq.cqe);
+
+	return (!!(roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_OWNER_S)) ^
+		!!(n & (cq->ibv_cq.cqe + 1))) ? cqe : NULL;
+}
+
+static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
+{
+	return get_sw_cqe(cq, cq->cons_index);
+}
+
+static void *get_send_wqe(struct hns_roce_qp *qp, int n)
+{
+	if ((n < 0) || (n > qp->sq.wqe_cnt)) {
+		printf("sq wqe index:%d,sq wqe cnt:%d\r\n", n, qp->sq.wqe_cnt);
+		return NULL;
+	}
+
+	return (void *)(qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift));
+}
+
+static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
+					    uint32_t qpn)
+{
+	int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+	if (ctx->qp_table[tind].refcnt) {
+		return ctx->qp_table[tind].table[qpn & ctx->qp_table_mask];
+	} else {
+		printf("hns_roce_find_qp fail!\n");
+		return NULL;
+	}
+}
+
+static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
+				struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
+{
+	uint32_t qpn;
+	int is_send;
+	uint16_t wqe_ctr;
+	uint32_t local_qpn;
+	struct hns_roce_wq *wq = NULL;
+	struct hns_roce_cqe *cqe = NULL;
+	struct hns_roce_wqe_ctrl_seg *sq_wqe = NULL;
+
+	/* According to CI, find the relative cqe */
+	cqe = next_cqe_sw(cq);
+	if (!cqe)
+		return CQ_EMPTY;
+
+	/* Get the next cqe, CI will be added gradually */
+	++cq->cons_index;
+
+	rmb();
+
+	qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+			     CQE_BYTE_16_LOCAL_QPN_S);
+
+	is_send = (roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_SQ_RQ_FLAG_S) ==
+		   HNS_ROCE_CQE_IS_SQ);
+
+	local_qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+				   CQE_BYTE_16_LOCAL_QPN_S);
+
+	/* if qp is zero, it will not get the correct qpn */
+	if (!*cur_qp ||
+	    (local_qpn & HNS_ROCE_CQE_QPN_MASK) != (*cur_qp)->ibv_qp.qp_num) {
+
+		*cur_qp = hns_roce_find_qp(to_hr_ctx(cq->ibv_cq.context),
+					   qpn & 0xffffff);
+		if (!*cur_qp) {
+			fprintf(stderr, PFX "can't find qp!\n");
+			return CQ_POLL_ERR;
+		}
+	}
+	wc->qp_num = qpn & 0xffffff;
+
+	if (is_send) {
+		wq = &(*cur_qp)->sq;
+		/*
+		 * if sq_signal_bits is 1, the tail pointer first update to
+		 * the wqe corresponding the current cqe
+		 */
+		if ((*cur_qp)->sq_signal_bits) {
+			wqe_ctr = (uint16_t)(roce_get_field(cqe->cqe_byte_4,
+						CQE_BYTE_4_WQE_INDEX_M,
+						CQE_BYTE_4_WQE_INDEX_S));
+			/*
+			 * wq->tail will plus a positive number every time,
+			 * when wq->tail exceeds 32b, it is 0 and acc
+			 */
+			wq->tail += (wqe_ctr - (uint16_t) wq->tail) &
+				    (wq->wqe_cnt - 1);
+		}
+		/* write the wr_id of wq into the wc */
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		++wq->tail;
+	} else {
+		wq = &(*cur_qp)->rq;
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		++wq->tail;
+	}
+
+	/*
+	 * HW maintains wc status, set the err type and directly return, after
+	 * generated the incorrect CQE
+	 */
+	if (roce_get_field(cqe->cqe_byte_4,
+	    CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+	    CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) != HNS_ROCE_CQE_SUCCESS) {
+		hns_roce_handle_error_cqe(cqe, wc);
+		return CQ_OK;
+	}
+	wc->status = IBV_WC_SUCCESS;
+
+	/*
+	 * According to the opcode type of cqe, mark the opcode and other
+	 * information of wc
+	 */
+	if (is_send) {
+		/* Get opcode and flag before update the tail point for send */
+		sq_wqe = (struct hns_roce_wqe_ctrl_seg *)
+			 get_send_wqe(*cur_qp, roce_get_field(cqe->cqe_byte_4,
+						CQE_BYTE_4_WQE_INDEX_M,
+						CQE_BYTE_4_WQE_INDEX_S));
+		switch (sq_wqe->flag & HNS_ROCE_WQE_OPCODE_MASK) {
+		case HNS_ROCE_WQE_OPCODE_SEND:
+			wc->opcode = IBV_WC_SEND;
+			break;
+		case HNS_ROCE_WQE_OPCODE_RDMA_READ:
+			wc->opcode = IBV_WC_RDMA_READ;
+			wc->byte_len = cqe->byte_cnt;
+			break;
+		case HNS_ROCE_WQE_OPCODE_RDMA_WRITE:
+			wc->opcode = IBV_WC_RDMA_WRITE;
+			break;
+		case HNS_ROCE_WQE_OPCODE_BIND_MW2:
+			wc->opcode = IBV_WC_BIND_MW;
+			break;
+		default:
+			wc->status = IBV_WC_GENERAL_ERR;
+			break;
+		}
+		wc->wc_flags = (sq_wqe->flag & HNS_ROCE_WQE_IMM ?
+				IBV_WC_WITH_IMM : 0);
+	} else {
+		/* Get opcode and flag in rq&srq */
+		wc->byte_len = (cqe->byte_cnt);
+
+		switch (roce_get_field(cqe->cqe_byte_4,
+				       CQE_BYTE_4_OPERATION_TYPE_M,
+				       CQE_BYTE_4_OPERATION_TYPE_S) &
+			HNS_ROCE_CQE_OPCODE_MASK) {
+		case HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE:
+			wc->opcode   = IBV_WC_RECV_RDMA_WITH_IMM;
+			wc->wc_flags = IBV_WC_WITH_IMM;
+			wc->imm_data = cqe->immediate_data;
+			break;
+		case HNS_ROCE_OPCODE_SEND_DATA_RECEIVE:
+			if (roce_get_bit(cqe->cqe_byte_4,
+					 CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S)) {
+				wc->opcode   = IBV_WC_RECV;
+				wc->wc_flags = IBV_WC_WITH_IMM;
+				wc->imm_data = cqe->immediate_data;
+			} else {
+				wc->opcode   = IBV_WC_RECV;
+				wc->wc_flags = 0;
+			}
+			break;
+		default:
+			wc->status = IBV_WC_GENERAL_ERR;
+			break;
+		}
+	}
+
+	return CQ_OK;
+}
+
+static int hns_roce_u_v1_poll_cq(struct ibv_cq *ibvcq, int ne,
+				 struct ibv_wc *wc)
+{
+	int npolled;
+	int err = CQ_OK;
+	struct hns_roce_qp *qp = NULL;
+	struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+	struct hns_roce_context *ctx = to_hr_ctx(ibvcq->context);
+	struct hns_roce_device *dev = to_hr_dev(ibvcq->context->device);
+
+	pthread_spin_lock(&cq->lock);
+
+	for (npolled = 0; npolled < ne; ++npolled) {
+		err = hns_roce_v1_poll_one(cq, &qp, wc + npolled);
+		if (err != CQ_OK)
+			break;
+	}
+
+	if (npolled) {
+		if (dev->hw_version == HNS_ROCE_HW_VER1) {
+			*cq->set_ci_db = (unsigned short)(cq->cons_index &
+					 ((cq->cq_depth << 1) - 1));
+			mb();
+		}
+
+		hns_roce_update_cq_cons_index(ctx, cq);
+	}
+
+	pthread_spin_unlock(&cq->lock);
+
+	return err == CQ_POLL_ERR ? err : npolled;
+}
+
+/**
+ * hns_roce_u_v1_arm_cq - request completion notification on a CQ
+ * @ibvcq: The completion queue to request notification for.
+ * @solicited: If non-zero, a event will be generated only for
+ *	      the next solicited CQ entry. If zero, any CQ entry,
+ *	      solicited or not, will generate an event
+ */
+static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
+{
+	uint32_t ci;
+	uint32_t solicited_flag;
+	struct hns_roce_cq_db cq_db;
+	struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+
+	ci  = cq->cons_index & ((cq->cq_depth << 1) - 1);
+	solicited_flag = solicited ? HNS_ROCE_CQ_DB_REQ_SOL :
+				     HNS_ROCE_CQ_DB_REQ_NEXT;
+
+	cq_db.u32_4 = 0;
+	cq_db.u32_8 = 0;
+
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+		       CQ_DB_U32_8_CMD_MDF_S, 1);
+	roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_NOTIFY_TYPE_S, solicited_flag);
+	roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+		       cq->cqn);
+	roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+		       CQ_DB_U32_4_CONS_IDX_S, ci);
+
+	hns_roce_write64((uint32_t *)&cq_db, to_hr_ctx(ibvcq->context),
+			  ROCEE_DB_OTHERS_L_0_REG);
+	return 0;
+}
+
+struct hns_roce_u_hw hns_roce_u_hw_v1 = {
+	.poll_cq = hns_roce_u_v1_poll_cq,
+	.arm_cq = hns_roce_u_v1_arm_cq,
+};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
new file mode 100644
index 0000000..b249f54
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_HW_V1_H
+#define _HNS_ROCE_U_HW_V1_H
+
+#define HNS_ROCE_CQ_DB_REQ_SOL			1
+#define HNS_ROCE_CQ_DB_REQ_NEXT			0
+
+#define HNS_ROCE_CQE_IS_SQ			0
+
+#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN	32
+
+enum {
+	HNS_ROCE_WQE_IMM		= 1 << 23,
+	HNS_ROCE_WQE_OPCODE_SEND        = 0 << 16,
+	HNS_ROCE_WQE_OPCODE_RDMA_READ   = 1 << 16,
+	HNS_ROCE_WQE_OPCODE_RDMA_WRITE  = 2 << 16,
+	HNS_ROCE_WQE_OPCODE_BIND_MW2    = 6 << 16,
+	HNS_ROCE_WQE_OPCODE_MASK        = 15 << 16,
+};
+
+struct hns_roce_wqe_ctrl_seg {
+	__be32		sgl_pa_h;
+	__be32		flag;
+};
+
+enum {
+	CQ_OK				=  0,
+	CQ_EMPTY			= -1,
+	CQ_POLL_ERR			= -2,
+};
+
+enum {
+	HNS_ROCE_CQE_QPN_MASK		= 0x3ffff,
+	HNS_ROCE_CQE_STATUS_MASK	= 0x1f,
+	HNS_ROCE_CQE_OPCODE_MASK	= 0xf,
+};
+
+enum {
+	HNS_ROCE_CQE_SUCCESS,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR,
+	HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR,
+	HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR,
+	HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR,
+	HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR,
+	HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR,
+	HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR,
+	HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR,
+};
+
+struct hns_roce_cq_db {
+	unsigned int u32_4;
+	unsigned int u32_8;
+};
+#define CQ_DB_U32_4_CONS_IDX_S 0
+#define CQ_DB_U32_4_CONS_IDX_M   (((1UL << 16) - 1) << CQ_DB_U32_4_CONS_IDX_S)
+
+#define CQ_DB_U32_8_CQN_S 0
+#define CQ_DB_U32_8_CQN_M   (((1UL << 16) - 1) << CQ_DB_U32_8_CQN_S)
+
+#define CQ_DB_U32_8_NOTIFY_TYPE_S 16
+
+#define CQ_DB_U32_8_CMD_MDF_S 24
+#define CQ_DB_U32_8_CMD_MDF_M   (((1UL << 4) - 1) << CQ_DB_U32_8_CMD_MDF_S)
+
+#define CQ_DB_U32_8_CMD_S 28
+#define CQ_DB_U32_8_CMD_M   (((1UL << 3) - 1) << CQ_DB_U32_8_CMD_S)
+
+#define CQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_cqe {
+	unsigned int cqe_byte_4;
+	union {
+		unsigned int r_key;
+		unsigned int immediate_data;
+	};
+	unsigned int byte_cnt;
+	unsigned int cqe_byte_16;
+	unsigned int cqe_byte_20;
+	unsigned int s_mac_l;
+	unsigned int cqe_byte_28;
+	unsigned int reserved;
+};
+#define CQE_BYTE_4_OPERATION_TYPE_S 0
+#define CQE_BYTE_4_OPERATION_TYPE_M   \
+	(((1UL << 4) - 1) << CQE_BYTE_4_OPERATION_TYPE_S)
+
+#define CQE_BYTE_4_OWNER_S 7
+
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_S 8
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_M   \
+	(((1UL << 5) - 1) << CQE_BYTE_4_STATUS_OF_THE_OPERATION_S)
+
+#define CQE_BYTE_4_SQ_RQ_FLAG_S 14
+
+#define CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S 15
+
+#define CQE_BYTE_4_WQE_INDEX_S 16
+#define CQE_BYTE_4_WQE_INDEX_M	(((1UL << 14) - 1) << CQE_BYTE_4_WQE_INDEX_S)
+
+#define CQE_BYTE_16_LOCAL_QPN_S 0
+#define CQE_BYTE_16_LOCAL_QPN_M	(((1UL << 24) - 1) << CQE_BYTE_16_LOCAL_QPN_S)
+
+#define ROCEE_DB_SQ_L_0_REG				0x230
+
+#define ROCEE_DB_OTHERS_L_0_REG				0x238
+
+struct hns_roce_rc_send_wqe {
+	unsigned int sgl_ba_31_0;
+	unsigned int u32_1;
+	union {
+		unsigned int r_key;
+		unsigned int immediate_data;
+	};
+	unsigned int msg_length;
+	unsigned int rvd_3;
+	unsigned int rvd_4;
+	unsigned int rvd_5;
+	unsigned int rvd_6;
+	uint64_t     va0;
+	unsigned int l_key0;
+	unsigned int length0;
+
+	uint64_t     va1;
+	unsigned int l_key1;
+	unsigned int length1;
+};
+
+#endif /* _HNS_ROCE_U_HW_V1_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index c163d3c..c9324dd 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -40,6 +40,8 @@
 #include <unistd.h>
 
 #include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+#include "hns_roce_u_hw_v1.h"
 
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr)
@@ -150,3 +152,117 @@ int hns_roce_u_dereg_mr(struct ibv_mr *mr)
 
 	return ret;
 }
+
+static int align_cq_size(int req)
+{
+	int nent;
+
+	for (nent = HNS_ROCE_MIN_CQE_NUM; nent < req; nent <<= 1)
+		;
+
+	return nent;
+}
+
+static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
+{
+	if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
+		fprintf(stderr, "cqe = %d, less than minimum CQE number.\n",
+			*cqe);
+		*cqe = HNS_ROCE_MIN_CQE_NUM;
+	}
+
+	if (*cqe > context->max_cqe)
+		return -1;
+
+	return 0;
+}
+
+static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
+				 struct hns_roce_buf *buf, int nent)
+{
+	if (hns_roce_alloc_buf(buf,
+			align(nent * HNS_ROCE_CQE_ENTRY_SIZE, dev->page_size),
+			dev->page_size))
+		return -1;
+	memset(buf->buf, 0, nent * HNS_ROCE_CQE_ENTRY_SIZE);
+
+	return 0;
+}
+
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector)
+{
+	struct hns_roce_create_cq	cmd;
+	struct hns_roce_create_cq_resp	resp;
+	struct hns_roce_cq		*cq;
+	int				ret;
+
+	if (hns_roce_verify_cq(&cqe, to_hr_ctx(context)))
+		return NULL;
+
+	cq = malloc(sizeof(*cq));
+	if (!cq)
+		return NULL;
+
+	cq->cons_index = 0;
+
+	if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
+		goto err;
+
+	cqe = align_cq_size(cqe);
+
+	if (hns_roce_alloc_cq_buf(to_hr_dev(context->device), &cq->buf, cqe))
+		goto err;
+
+	cmd.buf_addr = (uintptr_t) cq->buf.buf;
+
+	ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
+				&cq->ibv_cq, &cmd.ibv_cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp));
+	if (ret)
+		goto err_db;
+
+	cq->cqn = resp.cqn;
+	cq->cq_depth = cqe;
+
+	if (to_hr_dev(context->device)->hw_version == HNS_ROCE_HW_VER1)
+		cq->set_ci_db = to_hr_ctx(context)->cq_tptr_base + cq->cqn * 2;
+	else
+		cq->set_ci_db = to_hr_ctx(context)->uar +
+				ROCEE_DB_OTHERS_L_0_REG;
+
+	cq->arm_db    = cq->set_ci_db;
+	cq->arm_sn    = 1;
+	*(cq->set_ci_db) = 0;
+	*(cq->arm_db) = 0;
+
+	return &cq->ibv_cq;
+
+err_db:
+	hns_roce_free_buf(&cq->buf);
+
+err:
+	free(cq);
+
+	return NULL;
+}
+
+void hns_roce_u_cq_event(struct ibv_cq *cq)
+{
+	to_hr_cq(cq)->arm_sn++;
+}
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq)
+{
+	int ret;
+
+	ret = ibv_cmd_destroy_cq(cq);
+	if (ret)
+		return ret;
+
+	hns_roce_free_buf(&to_hr_cq(cq)->buf);
+	free(to_hr_cq(cq));
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 3/7] libhns: Add verbs of pd and mr support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces the verbs with pd and mr,
included alloc_pd, dealloc_pd, reg_mr and dereg_mr.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4:
- No change over v3

v3:
This fixes the comments given by Leon Romanovsky on PATCHv2

v2:
- No change over v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  4 ++
 providers/hns/hns_roce_u.h       | 18 +++++++++
 providers/hns/hns_roce_u_abi.h   |  6 +++
 providers/hns/hns_roce_u_verbs.c | 79 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index c0f6fe9..53e2720 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -97,6 +97,10 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 
 	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
 	context->ibv_ctx.ops.query_port    = hns_roce_u_query_port;
+	context->ibv_ctx.ops.alloc_pd	   = hns_roce_u_alloc_pd;
+	context->ibv_ctx.ops.dealloc_pd    = hns_roce_u_free_pd;
+	context->ibv_ctx.ops.reg_mr	   = hns_roce_u_reg_mr;
+	context->ibv_ctx.ops.dereg_mr	   = hns_roce_u_dereg_mr;
 
 	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
 		goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index aa58ee6..5b73794 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -73,6 +73,11 @@ struct hns_roce_context {
 	int				max_cqe;
 };
 
+struct hns_roce_pd {
+	struct ibv_pd			ibv_pd;
+	unsigned int			pdn;
+};
+
 static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
 {
 	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -83,8 +88,21 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
 	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
 }
 
+static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
+{
+	return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
+}
+
 int hns_roce_u_query_device(struct ibv_context *context,
 			    struct ibv_device_attr *attr);
 int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
 			  struct ibv_port_attr *attr);
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context);
+int hns_roce_u_free_pd(struct ibv_pd *pd);
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+				 int access);
+int hns_roce_u_dereg_mr(struct ibv_mr *mr);
+
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 4bfc8fa..0a0cd0c 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -40,4 +40,10 @@ struct hns_roce_alloc_ucontext_resp {
 	__u32				qp_tab_size;
 };
 
+struct hns_roce_alloc_pd_resp {
+	struct ibv_alloc_pd_resp	ibv_resp;
+	__u32				pdn;
+	__u32				reserved;
+};
+
 #endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 2d95f20..c163d3c 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -71,3 +71,82 @@ int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
 
 	return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
 }
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context)
+{
+	struct ibv_alloc_pd cmd;
+	struct hns_roce_pd *pd;
+	struct hns_roce_alloc_pd_resp resp;
+
+	pd = (struct hns_roce_pd *)malloc(sizeof(*pd));
+	if (!pd)
+		return NULL;
+
+	if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof(cmd),
+			     &resp.ibv_resp, sizeof(resp))) {
+		free(pd);
+		return NULL;
+	}
+
+	pd->pdn = resp.pdn;
+
+	return &pd->ibv_pd;
+}
+
+int hns_roce_u_free_pd(struct ibv_pd *pd)
+{
+	int ret;
+
+	ret = ibv_cmd_dealloc_pd(pd);
+	if (ret)
+		return ret;
+
+	free(to_hr_pd(pd));
+
+	return ret;
+}
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+				 int access)
+{
+	int ret;
+	struct ibv_mr *mr;
+	struct ibv_reg_mr cmd;
+	struct ibv_reg_mr_resp resp;
+
+	if (!addr) {
+		fprintf(stderr, "2nd parm addr is NULL!\n");
+		return NULL;
+	}
+
+	if (!length) {
+		fprintf(stderr, "3st parm length is 0!\n");
+		return NULL;
+	}
+
+	mr = malloc(sizeof(*mr));
+	if (!mr)
+		return NULL;
+
+	ret = ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr, access, mr,
+			     &cmd, sizeof(cmd), &resp, sizeof(resp));
+	if (ret) {
+		free(mr);
+		return NULL;
+	}
+
+	return mr;
+}
+
+int hns_roce_u_dereg_mr(struct ibv_mr *mr)
+{
+	int ret;
+
+	ret = ibv_cmd_dereg_mr(mr);
+	if (ret)
+		return ret;
+
+	free(mr);
+
+	return ret;
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 2/7] libhns: Add verbs of querying device and querying port
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces query verbs for querying device
and querying port.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5:
Change the type of raw_fw_ver for uint64_t

v4/v3/v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c       |  7 ++++
 providers/hns/hns_roce_u.h       |  4 +++
 providers/hns/hns_roce_u_verbs.c | 73 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)
 create mode 100644 providers/hns/hns_roce_u_verbs.c

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index bda4dd8..c0f6fe9 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -95,12 +95,19 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
 
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 
+	context->ibv_ctx.ops.query_device  = hns_roce_u_query_device;
+	context->ibv_ctx.ops.query_port    = hns_roce_u_query_port;
+
+	if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
+		goto tptr_free;
+
 	context->max_qp_wr = dev_attrs.max_qp_wr;
 	context->max_sge = dev_attrs.max_sge;
 	context->max_cqe = dev_attrs.max_cqe;
 
 	return &context->ibv_ctx;
 
+tptr_free:
 err_free:
 	free(context);
 	return NULL;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 3eef171..aa58ee6 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -83,4 +83,8 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
 	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
 }
 
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr);
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr);
 #endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
new file mode 100644
index 0000000..2d95f20
--- /dev/null
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_u_query_device(struct ibv_context *context,
+			    struct ibv_device_attr *attr)
+{
+	int ret;
+	struct ibv_query_device cmd;
+	uint64_t raw_fw_ver;
+	unsigned int major, minor, sub_minor;
+
+	ret = ibv_cmd_query_device(context, attr, &raw_fw_ver, &cmd,
+				   sizeof(cmd));
+	if (ret)
+		return ret;
+
+	major	   = (raw_fw_ver >> 32) & 0xffff;
+	minor	   = (raw_fw_ver >> 16) & 0xffff;
+	sub_minor = raw_fw_ver & 0xffff;
+
+	snprintf(attr->fw_ver, sizeof(attr->fw_ver), "%d.%d.%03d", major, minor,
+		 sub_minor);
+
+	return 0;
+}
+
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+			  struct ibv_port_attr *attr)
+{
+	struct ibv_query_port cmd;
+
+	return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
+}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 1/7] libhns: Add initial main frame
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

This patch mainly introduces initial main frame for
userspace library of hns.

Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3/v2:
- No change over the v1

v1:
- The initial submit
---
 providers/hns/hns_roce_u.c     | 163 +++++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u.h     |  86 ++++++++++++++++++++++
 providers/hns/hns_roce_u_abi.h |  43 +++++++++++
 3 files changed, 292 insertions(+)
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h

diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
new file mode 100644
index 0000000..bda4dd8
--- /dev/null
+++ b/providers/hns/hns_roce_u.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+
+#define HID_LEN			15
+#define DEV_MATCH_LEN		128
+
+static const struct {
+	char	 hid[HID_LEN];
+} acpi_table[] = {
+	{"acpi:HISI00D1:"},
+	{},
+};
+
+static const struct {
+	char	 compatible[DEV_MATCH_LEN];
+} dt_table[] = {
+	{"hisilicon,hns-roce-v1"},
+	{},
+};
+
+static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
+						  int cmd_fd)
+{
+	int i;
+	struct ibv_get_context cmd;
+	struct ibv_device_attr dev_attrs;
+	struct hns_roce_context *context;
+	struct hns_roce_alloc_ucontext_resp resp;
+	struct hns_roce_device *hr_dev = to_hr_dev(ibdev);
+
+	context = calloc(1, sizeof(*context));
+	if (!context)
+		return NULL;
+
+	context->ibv_ctx.cmd_fd = cmd_fd;
+	if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp)))
+		goto err_free;
+
+	context->num_qps = resp.qp_tab_size;
+	context->qp_table_shift = ffs(context->num_qps) - 1 -
+				  HNS_ROCE_QP_TABLE_BITS;
+	context->qp_table_mask = (1 << context->qp_table_shift) - 1;
+
+	pthread_mutex_init(&context->qp_table_mutex, NULL);
+	for (i = 0; i < HNS_ROCE_QP_TABLE_SIZE; ++i)
+		context->qp_table[i].refcnt = 0;
+
+	context->uar = mmap(NULL, to_hr_dev(ibdev)->page_size,
+			    PROT_READ | PROT_WRITE, MAP_SHARED, cmd_fd, 0);
+	if (context->uar == MAP_FAILED) {
+		fprintf(stderr, PFX "Warning: failed to mmap() uar page.\n");
+		goto err_free;
+	}
+
+	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+
+	context->max_qp_wr = dev_attrs.max_qp_wr;
+	context->max_sge = dev_attrs.max_sge;
+	context->max_cqe = dev_attrs.max_cqe;
+
+	return &context->ibv_ctx;
+
+err_free:
+	free(context);
+	return NULL;
+}
+
+static void hns_roce_free_context(struct ibv_context *ibctx)
+{
+	struct hns_roce_context *context = to_hr_ctx(ibctx);
+
+	munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+
+	context->uar = NULL;
+
+	free(context);
+	context = NULL;
+}
+
+static struct ibv_device_ops hns_roce_dev_ops = {
+	.alloc_context = hns_roce_alloc_context,
+	.free_context	= hns_roce_free_context
+};
+
+static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
+					       int abi_version)
+{
+	struct hns_roce_device  *dev;
+	char			 value[128];
+	int			 i;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
+			if (!strcmp(value, acpi_table[i].hid))
+				goto found;
+
+	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
+				value, sizeof(value)) > 0)
+		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
+			if (!strcmp(value, dt_table[i].compatible))
+				goto found;
+
+	return NULL;
+
+found:
+	dev = malloc(sizeof(struct hns_roce_device));
+	if (!dev) {
+		fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n",
+			uverbs_sys_path);
+		return NULL;
+	}
+
+	dev->ibv_dev.ops = hns_roce_dev_ops;
+	dev->page_size   = sysconf(_SC_PAGESIZE);
+	return &dev->ibv_dev;
+}
+
+static __attribute__((constructor)) void hns_roce_register_driver(void)
+{
+	ibv_register_driver("hns", hns_roce_driver_init);
+}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
new file mode 100644
index 0000000..3eef171
--- /dev/null
+++ b/providers/hns/hns_roce_u.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_H
+#define _HNS_ROCE_U_H
+
+#include <stddef.h>
+
+#include <infiniband/driver.h>
+#include <infiniband/arch.h>
+#include <infiniband/verbs.h>
+#include <ccan/container_of.h>
+
+#define HNS_ROCE_HW_VER1		('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
+#define PFX				"hns: "
+
+enum {
+	HNS_ROCE_QP_TABLE_BITS		= 8,
+	HNS_ROCE_QP_TABLE_SIZE		= 1 << HNS_ROCE_QP_TABLE_BITS,
+};
+
+struct hns_roce_device {
+	struct ibv_device		ibv_dev;
+	int				page_size;
+};
+
+struct hns_roce_context {
+	struct ibv_context		ibv_ctx;
+	void				*uar;
+	pthread_spinlock_t		uar_lock;
+
+	struct {
+		int			refcnt;
+	} qp_table[HNS_ROCE_QP_TABLE_SIZE];
+
+	pthread_mutex_t			qp_table_mutex;
+
+	int				num_qps;
+	int				qp_table_shift;
+	int				qp_table_mask;
+	unsigned int			max_qp_wr;
+	unsigned int			max_sge;
+	int				max_cqe;
+};
+
+static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
+{
+	return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
+}
+
+static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
+{
+	return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
+}
+
+#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
new file mode 100644
index 0000000..4bfc8fa
--- /dev/null
+++ b/providers/hns/hns_roce_u_abi.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_ABI_H
+#define _HNS_ROCE_U_ABI_H
+
+#include <infiniband/kern-abi.h>
+
+struct hns_roce_alloc_ucontext_resp {
+	struct ibv_get_context_resp	ibv_resp;
+	__u32				qp_tab_size;
+};
+
+#endif /* _HNS_ROCE_U_ABI_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 rdma-core 0/7] libhns: userspace library for hns
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA

This patch series introduces userspace library for hns RoCE driver.

changes v4 -> v5:
1. elminate the warning when CFLAGS equal to -m32

changes v3 -> v4:
1. eliminate the warning by Travis CI testing

changes v2 -> v3:
1. Fix the code style, for example, if (addr == NULL)
2. Fix the bug for hns_roce_u_reg_mr

changes v1 -> v2:
1. Delete the min() definition and instead of ccan header
2. Delete the CHECK_C_SOURCE_COMPILES
3. sort the c file in rdma_provider()
4. Delete the unused code in hns_roce_u_db.h

Lijun Ou (7):
  libhns: Add initial main frame
  libhns: Add verbs of querying device and querying port
  libhns: Add verbs of pd and mr support
  libhns: Add verbs of cq support
  libhns: Add verbs of qp support
  libhns: Add verbs of post_send and post_recv support
  libhns: Add consolidated repo for userspace library of hns

 CMakeLists.txt                   |   1 +
 MAINTAINERS                      |   6 +
 README.md                        |   1 +
 providers/hns/CMakeLists.txt     |   6 +
 providers/hns/hns_roce_u.c       | 228 +++++++++++
 providers/hns/hns_roce_u.h       | 255 ++++++++++++
 providers/hns/hns_roce_u_abi.h   |  69 ++++
 providers/hns/hns_roce_u_buf.c   |  61 +++
 providers/hns/hns_roce_u_db.h    |  54 +++
 providers/hns/hns_roce_u_hw_v1.c | 837 +++++++++++++++++++++++++++++++++++++++
 providers/hns/hns_roce_u_hw_v1.h | 242 +++++++++++
 providers/hns/hns_roce_u_verbs.c | 525 ++++++++++++++++++++++++
 12 files changed, 2285 insertions(+)
 create mode 100644 providers/hns/CMakeLists.txt
 create mode 100644 providers/hns/hns_roce_u.c
 create mode 100644 providers/hns/hns_roce_u.h
 create mode 100644 providers/hns/hns_roce_u_abi.h
 create mode 100644 providers/hns/hns_roce_u_buf.c
 create mode 100644 providers/hns/hns_roce_u_db.h
 create mode 100644 providers/hns/hns_roce_u_hw_v1.c
 create mode 100644 providers/hns/hns_roce_u_hw_v1.h
 create mode 100644 providers/hns/hns_roce_u_verbs.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 rdma-core 0/7] libhns: userspace library for hns
From: oulijun @ 2016-11-13 10:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161111130350.GI28957-2ukJVAZIZ/Y@public.gmane.org>

在 2016/11/11 21:03, Leon Romanovsky 写道:
> On Fri, Nov 11, 2016 at 12:57:23PM +0800, Lijun Ou wrote:
>> This patch series introduces userspace library for hns RoCE driver.
>>
>> changes v3 -> v4:
>> 1. eliminate the warning by Travis CI testing
> 
> It still fails.
> https://travis-ci.org/linux-rdma/rdma-core/builds/175060470
> 
> Please test it locally.
> 
> Thanks.
>
Hi, leon
  I have passed the Travis CI testing according to your advice [2] and
sent the patchv5

Lijun Ou

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 rdma-core 0/7] libhns: userspace library for hns
From: oulijun @ 2016-11-13 10:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161111124240.GG28957-2ukJVAZIZ/Y@public.gmane.org>

在 2016/11/11 20:42, Leon Romanovsky 写道:
> On Fri, Nov 11, 2016 at 02:37:46PM +0800, oulijun wrote:
>> 在 2016/11/11 0:15, Leon Romanovsky 写道:
>>> On Thu, Nov 10, 2016 at 08:46:10PM +0800, Lijun Ou wrote:
>>>> This patch series introduces userspace library for hns RoCE driver.
>>>>
>>>> changes v2 -> v3:
>>>> 1. Fix the code style, for example, if (addr == NULL)
>>>> 2. Fix the bug for hns_roce_u_reg_mr
>>>>
>>>> changes v1 -> v2:
>>>> 1. Delete the min() definition and instead of ccan header
>>>> 2. Delete the CHECK_C_SOURCE_COMPILES
>>>> 3. sort the c file in rdma_provider()
>>>> 4. Delete the unused code in hns_roce_u_db.h
>>>>
>>>> Lijun Ou (7):
>>>>   libhns: Add initial main frame
>>>>   libhns: Add verbs of querying device and querying port
>>>>   libhns: Add verbs of pd and mr support
>>>>   libhns: Add verbs of cq support
>>>>   libhns: Add verbs of qp support
>>>>   libhns: Add verbs of post_send and post_recv support
>>>>   libhns: Add consolidated repo for userspace library of hns
>>>
>>> Hi Lijun,
>>>
>>> I tried to take your patch set, but it fails to pass our Travis CI
>>> tests.
>>>
>>> https://github.com/linux-rdma/rdma-core/pull/38
>>> https://travis-ci.org/linux-rdma/rdma-core/builds/174815897
>>>
>>> You need to fix it prior to our acceptance.
>>>
>>> Thanks
>>>
>> Hi, Leon
>>   I have checked the patches and fixed it and send the patchv4. I think that
>> the patchv4 will not have errors and warnings for travis CI testing, but
>> I don't have the Travis CI environment for testing it.
>> I have tried to install the test environment according to your test log and it
>> is fail.
>> In addtion that, I have tested it for  patchv1/patchv2/patch3 by use the following way and it
>> is ok
>>  1. directly use the default script file build.sh
>>     ./build.sh
>>  2. use CC=aarch64-linux-gnu-gcc build
>>     mkdir build
>>     cd build
>>     CC=aarch64-linux-gnu-gcc cmake -Gninja -DENABLE_RESOLVE_NEIGH=0 ..
> 
> Generally speaking, you have 4 possible ways to check your code prior
> your submission:
> 1. Use Travis CI locally [1].
> 2. Fork rdma-core to your personal github account which can be connected
> to Travis CI for free.
> 3. Install all required tools and run/compile locally
> 4. Send pull request to rdma-core via github interface and it will check
> automatically.
> 
> First 3 options are the best.
> 
> [1]
> https://docs.travis-ci.com/user/common-build-problems/#Troubleshooting-Locally-in-a-Docker-Image
> 
Thanks for your guide. I have test it according the [2] and sent patchv5

Lijun Ou
>>
>> lijun Ou
>>>
>>>>
>>>>  CMakeLists.txt                   |   1 +
>>>>  MAINTAINERS                      |   6 +
>>>>  README.md                        |   1 +
>>>>  providers/hns/CMakeLists.txt     |   6 +
>>>>  providers/hns/hns_roce_u.c       | 228 +++++++++++
>>>>  providers/hns/hns_roce_u.h       | 255 ++++++++++++
>>>>  providers/hns/hns_roce_u_abi.h   |  69 ++++
>>>>  providers/hns/hns_roce_u_buf.c   |  61 +++
>>>>  providers/hns/hns_roce_u_db.h    |  54 +++
>>>>  providers/hns/hns_roce_u_hw_v1.c | 839 +++++++++++++++++++++++++++++++++++++++
>>>>  providers/hns/hns_roce_u_hw_v1.h | 242 +++++++++++
>>>>  providers/hns/hns_roce_u_verbs.c | 525 ++++++++++++++++++++++++
>>>>  12 files changed, 2287 insertions(+)
>>>>  create mode 100644 providers/hns/CMakeLists.txt
>>>>  create mode 100644 providers/hns/hns_roce_u.c
>>>>  create mode 100644 providers/hns/hns_roce_u.h
>>>>  create mode 100644 providers/hns/hns_roce_u_abi.h
>>>>  create mode 100644 providers/hns/hns_roce_u_buf.c
>>>>  create mode 100644 providers/hns/hns_roce_u_db.h
>>>>  create mode 100644 providers/hns/hns_roce_u_hw_v1.c
>>>>  create mode 100644 providers/hns/hns_roce_u_hw_v1.h
>>>>  create mode 100644 providers/hns/hns_roce_u_verbs.c
>>>>
>>>> --
>>>> 1.9.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 rdma-core 0/7] libhns: userspace library for hns
From: oulijun @ 2016-11-13 10:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161111160202.GA22935-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/11/12 0:02, Jason Gunthorpe 写道:
> On Fri, Nov 11, 2016 at 02:42:40PM +0200, Leon Romanovsky wrote:
> 
>>>   I have checked the patches and fixed it and send the patchv4. I think that
>>> the patchv4 will not have errors and warnings for travis CI testing, but
>>> I don't have the Travis CI environment for testing it.
>>> I have tried to install the test environment according to your test log and it
>>> is fail.
>>> In addtion that, I have tested it for  patchv1/patchv2/patch3 by use the following way and it
>>> is ok
>>>  1. directly use the default script file build.sh
>>>     ./build.sh
>>>  2. use CC=aarch64-linux-gnu-gcc build
>>>     mkdir build
>>>     cd build
>>>     CC=aarch64-linux-gnu-gcc cmake -Gninja -DENABLE_RESOLVE_NEIGH=0 ..
> 
> For the 32 bit bugs all you need to do is run on x86 with the 32 bit
> compiler support installed:
> 
> $ CFLAGS=-m32 cmake -GNinja ..
> $ ninja
> 
>> 4. Send pull request to rdma-core via github interface and it will check
>> automatically.
> 
> This is certainly a better option than having Leon test your patches,
> just keep force pushing to your own pull request until travis is
> happy.
> 
> Make sure the 32 bit problems are fixed properly, use inttypes.h,
> uintptr, and so forth.
> 
> Jason
> 
Thanks your guide. I have test it by travis CI successfully and sent tha patchV5

Lijun Ou

> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* I Hope You Get My Message This Time
From: Mr Friedrich Mayrhofer @ 2016-11-12  6:41 UTC (permalink / raw)



-- 
This is the second time i am sending you this mail.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email  Me  
personally for more details.

Regards.
Friedrich Mayrhofer

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-next V1 06/17] IB/hfi1: Remove debug prints after allocation failure
From: Dalessandro, Dennis @ 2016-11-12  4:04 UTC (permalink / raw)
  To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1478184265-9620-7-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 466 bytes --]

On Thu, 2016-11-03 at 16:44 +0200, Leon Romanovsky wrote:
> The prints after [k|v][m|z|c]alloc() functions are not needed,
> because in case of failure, allocator will print their internal
> error prints anyway.
> 
> Signed-off-by: Leon Romanovsky <leon@kernel.org>

Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply

* [PATCH, RESEND] IB/srpt: Report login failures only once
From: Bart Van Assche @ 2016-11-12  0:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Nicholas A. Bellinger, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Report the following message only once if no ACL has been configured
yet for an initiator port:

"Rejected login because no ACL has been configured yet for initiator %s.\n"

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index b268aaa..0951e4f 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1842,7 +1842,6 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	struct srpt_rdma_ch *ch, *tmp_ch;
 	u32 it_iu_len;
 	int i, ret = 0;
-	unsigned char *p;
 
 	WARN_ON_ONCE(irqs_disabled());
 
@@ -1996,21 +1995,18 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 			be64_to_cpu(*(__be64 *)(ch->i_port_id + 8)));
 
 	pr_debug("registering session %s\n", ch->sess_name);
-	p = &ch->sess_name[0];
 
-try_again:
 	ch->sess = target_alloc_session(&sport->port_tpg_1, 0, 0,
-					TARGET_PROT_NORMAL, p, ch, NULL);
+					TARGET_PROT_NORMAL, ch->sess_name, ch,
+					NULL);
+	/* Retry without leading "0x" */
+	if (IS_ERR(ch->sess))
+		ch->sess = target_alloc_session(&sport->port_tpg_1, 0, 0,
+						TARGET_PROT_NORMAL,
+						ch->sess_name + 2, ch, NULL);
 	if (IS_ERR(ch->sess)) {
-		pr_info("Rejected login because no ACL has been"
-			" configured yet for initiator %s.\n", p);
-		/*
-		 * XXX: Hack to retry of ch->i_port_id without leading '0x'
-		 */
-		if (p == &ch->sess_name[0]) {
-			p += 2;
-			goto try_again;
-		}
+		pr_info("Rejected login because no ACL has been configured yet for initiator %s.\n",
+			ch->sess_name);
 		rej->reason = cpu_to_be32((PTR_ERR(ch->sess) == -ENOMEM) ?
 				SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES :
 				SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED);
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Problems trying to bridge/route RoCE
From: Robert LeBlanc @ 2016-11-11 20:45 UTC (permalink / raw)
  To: Majd Dibbiny; +Cc: Parav Pandit, linux-rdma
In-Reply-To: <45EBC19B-6779-4AA6-BEC6-A61103A2C501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

How do you add GRH for iSER? Does it happen automatically? I thought
that is what default_roce_mode would do. What am I missing here?

My testbed had to be torn down today, so I've got to set it up again
on different hardware. So I won't be able to really test things until
next week, until then I'll try to understand it as much as I can.

Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Nov 11, 2016 at 1:34 AM, Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
> On Nov 11, 2016, at 12:33 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>
> I found a ConnectX-3 (non-pro) and wired it up. So in bridge mode, it
> seems like I can get ib_read_bw to work (still with a warm-up error
> message), but as router, I'm still having trouble.
>
> 192.168.21.17 ----- Linux bridge ------ 192.168.21.18
>
> # ib_read_bw -d mlx5_0 -F -a 192.168.21.17
>
> Hi Robert,
>
> You should provide the gid index parameter which adds GRH to the packet in
> order to work with RoCE.
>
> In the perftest suite it's -x parameter.
>
> If you are trying to pass traffic between different subnets, then you need
> to run routable roce traffic and thus using RoCE v2 gid index.
>
> Also, if you are using rdma-cm, you need to configure the rdma-cm default
> gid type to v2 as well using configfs.
>
> ---------------------------------------------------------------------------------------
> Device not recognized to implement inline feature. Disabling it
> ------I
> ---------------------------------------------------------------------------------
>
>                    RDMA_Read BW Test
> Dual-port       : OFF          Device         : mlx5_0
> Number of qps   : 1            Transport type : IB
> Connection type : RC           Using SRQ      : OFF
> TX depth        : 128
> CQ Moderation   : 100
> Mtu             : 1024[B]
> Link type       : Ethernet
> Gid index       : 0
> Outstand reads  : 16
> rdma_cm QPs     : OFF
> Data ex. method : Ethernet
> ---------------------------------------------------------------------------------------
> local address: LID 0000 QPN 0x0135 PSN 0x12f108 OUT 0x10 RKey
> 0x009f79 VAddr 0x007f1c82d1f000
> GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:21:18
> remote address: LID 0000 QPN 0x0175 PSN 0x37982e OUT 0x10 RKey
> 0x00eac9 VAddr 0x007f54c1405000
> GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:21:17
> ---------------------------------------------------------------------------------------
> #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
> MsgRate[Mpps]
> Conflicting CPU frequency values detected: 3698.669000 != 3102.661000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.86 differs from nominal 3698.67
> MHz
> 2          1000             0.65               0.65               0.341088
> Conflicting CPU frequency values detected: 3699.310000 != 1199.920000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500.01 differs from nominal 3699.31
> MHz
> 4          1000             0.10               0.10               0.025750
> Conflicting CPU frequency values detected: 3681.579000 != 1199.920000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.99 differs from nominal 3681.58
> MHz
> 8          1000             2.77               2.77               0.363689
> Conflicting CPU frequency values detected: 3602.325000 != 3265.655000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.99 differs from nominal 3602.32
> MHz
> 16         1000             5.37               5.36               0.351569
> Conflicting CPU frequency values detected: 3600.830000 != 3265.655000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.97 differs from nominal 3600.83
> MHz
> 32         1000             11.30              11.29              0.370062
> Conflicting CPU frequency values detected: 3599.761000 != 3265.655000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500.01 differs from nominal 3599.76
> MHz
> 64         1000             22.39              22.28              0.365108
> Conflicting CPU frequency values detected: 3599.975000 != 3265.655000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500.01 differs from nominal 3599.97
> MHz
> 128        1000             45.09              45.08              0.369316
> Conflicting CPU frequency values detected: 3599.761000 != 3265.655000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.99 differs from nominal 3599.76
> MHz
> 256        1000             89.55              89.54              0.366765
> Conflicting CPU frequency values detected: 3599.761000 != 2280.212000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500 differs from nominal 3599.76 MHz
> 512        1000             179.65             179.64             0.367907
> Conflicting CPU frequency values detected: 3599.761000 != 1200.347000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.99 differs from nominal 3599.76
> MHz
> 1024       1000             361.00             360.98             0.369639
> Conflicting CPU frequency values detected: 3601.043000 != 1751.495000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500.01 differs from nominal 3601.04
> MHz
> 2048       1000             492.15             491.42             0.251606
> Conflicting CPU frequency values detected: 3698.028000 != 3601.470000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500.01 differs from nominal 3698.03
> MHz
> 4096       1000             617.10             615.00             0.157440
> Conflicting CPU frequency values detected: 3684.356000 != 3600.189000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3500 differs from nominal 3684.36 MHz
> 8192       1000             679.31             679.30             0.086951
> Conflicting CPU frequency values detected: 3646.759000 != 1877.532000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.98 differs from nominal 3646.76
> MHz
> 16384      1000             722.86             722.85             0.046262
> Conflicting CPU frequency values detected: 3599.975000 != 2271.881000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.99 differs from nominal 3599.97
> MHz
> 32768      1000             742.08             742.08             0.023746
> Conflicting CPU frequency values detected: 3602.966000 != 1933.929000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 3499.97 differs from nominal 3602.97
> MHz
> 65536      1000             763.25             762.52             0.012200
> mlx5: prv-0-18-roberttest.betterservers.com: got completion with error:
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00008813 10000135 4680fcd2
> Problems with warm up
>
>
> === Router config ===
> 192.168.21.17 ------ 192.168.21.11 (Linux router) 192.168.22.11 ------
> 192.168.21.18
>
> #192.168.22.18
> # ping 192.168.21.17
> PING 192.168.21.17 (192.168.21.17) 56(84) bytes of data.
> 64 bytes from 192.168.21.17: icmp_seq=1 ttl=63 time=0.191 ms
> ^C
> --- 192.168.21.17 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.191/0.191/0.191/0.000 ms
>
> #192.168.21.17
> # route -n | grep 168
> 192.168.21.0    0.0.0.0         255.255.255.0   U     0      0        0 eth2
> 192.168.22.0    192.168.21.11   255.255.255.0   UG    0      0        0 eth2
>
> #192.168.22.18
> # route -n | grep 168
> 192.168.21.0    192.168.22.11   255.255.255.0   UG    0      0        0 eth2
> 192.168.22.0    0.0.0.0         255.255.255.0   U     0      0        0 eth2
>
> #192.168.22.18
> # ib_read_bw -d mlx5_0 -F -a 192.168.21.17
> ---------------------------------------------------------------------------------------
> Device not recognized to implement inline feature. Disabling it
> ---------------------------------------------------------------------------------------
>                    RDMA_Read BW Test
> Dual-port       : OFF          Device         : mlx5_0
> Number of qps   : 1            Transport type : IB
> Connection type : RC           Using SRQ      : OFF
> TX depth        : 128
> CQ Moderation   : 100
> Mtu             : 1024[B]
> Link type       : Ethernet
> Gid index       : 0
> Outstand reads  : 16
> rdma_cm QPs     : OFF
> Data ex. method : Ethernet
> ---------------------------------------------------------------------------------------
> local address: LID 0000 QPN 0x013a PSN 0x676912 OUT 0x10 RKey
> 0x00dfd3 VAddr 0x007fe67aee8000
> GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:22:18
> remote address: LID 0000 QPN 0x017a PSN 0x4256ce OUT 0x10 RKey
> 0x012985 VAddr 0x007f59de5bf000
> GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:21:17
> ---------------------------------------------------------------------------------------
> #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
> MsgRate[Mpps]
> Problems with warm up
>
>
> #192.168.21.17
> # cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
> RoCE v2
>
> #192.168.22.18
> # cat /sys/kernel/config/rdma_cm/mlx5_0/ports/1/default_roce_mode
> RoCE v2
>
> With routing, I'm not seeing any RoCE traffic with tcpdump on the
> interfaces. With bridge mode, I do see the RoCE traffic, but it looks
> like RoCE v1 traffic.
>
> [snip]
> 14:55:06.010682 0c:c4:7a:89:f7:06 > 0c:c4:7a:89:f6:f6, ethertype
> Unknown (0x8915), length 78:
>        0x0000:  6010 0000 0018 1b40 0000 0000 0000 0000  `......@........
>        0x0010:  0000 ffff c0a8 1511 0000 0000 0000 0000  ................
>        0x0020:  0000 ffff c0a8 1512 1060 ffff 0000 013e  .........`.....>
>        0x0030:  00e5 7b6c 0000 0411 0000 0000 60bb 6a87  ..{l........`.j.
> [snip]
>
> I can get iSER to kind of work...
>
> In bridge mode and running fio on the iSER target, I'm getting
> messages in dmesg:
> [Thu Nov 10 15:14:17 2016] mlx5_0:dump_cqe:263:(pid 0): dump error cqe
> [Thu Nov 10 15:14:17 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:17 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:17 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:17 2016] 00000000 08007806 2500014f a7a758d2
> [Thu Nov 10 15:14:17 2016] iser: iser_err_comp: memreg failure: memory
> management operation error (6) vend_err 78
> [Thu Nov 10 15:14:17 2016]  connection82:0: detected conn error (1011)
> [Thu Nov 10 15:14:24 2016] mlx5_0:dump_cqe:263:(pid 0): dump error cqe
> [Thu Nov 10 15:14:24 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:24 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:24 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:14:24 2016] 00000000 08007806 25000150 3471eed2
> ...
>
> In routed mode I also get the same messages, but the device goes
> offline and crashes fio
>
> [Thu Nov 10 15:09:13 2016] mlx5_0:dump_cqe:263:(pid 0): dump error cqe
> [Thu Nov 10 15:09:13 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:09:13 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:09:13 2016] 00000000 00000000 00000000 00000000
> [Thu Nov 10 15:09:13 2016] 00000000 08007806 25000149 5a524ad2
> [Thu Nov 10 15:09:13 2016] iser: iser_err_comp: memreg failure: memory
> management operation error (6) vend_err 78
> [Thu Nov 10 15:09:13 2016]  connection80:0: detected conn error (1011)
> [Thu Nov 10 15:09:18 2016]  session80: session recovery timed out after 5
> secs
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: rejecting I/O to offline device
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] killing request
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: rejecting I/O to offline device
> [Thu Nov 10 15:09:18 2016] scsi_io_completion: 23 callbacks suppressed
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] CDB: Read(10) 28 00 09
> 9f 97 18 00 01 48 00
> [Thu Nov 10 15:09:18 2016] blk_update_request: 23 callbacks suppressed
> [Thu Nov 10 15:09:18 2016] blk_update_request: I/O error, dev sdab,
> sector 161453848
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] killing request
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: rejecting I/O to offline device
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] CDB: Read(10) 28 00 07
> bf 98 60 00 00 a8 00
> [Thu Nov 10 15:09:18 2016] blk_update_request: I/O error, dev sdab,
> sector 129996896
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] killing request
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: rejecting I/O to offline device
> [Thu Nov 10 15:09:18 2016] sd 13:0:0:0: [sdab] killing request
> ...
>
> This is all using ConnectX-4 LX cards on the target and initiator and
> the 3.8.5 kernel.
>
> Any ideas of what may be causing these issues?
>
> Thank you,
> Robert LeBlanc
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Thu, Nov 3, 2016 at 11:38 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
> wrote:
>
> That box has a build-in ConnectX-3 card that we aren't using in this
>
> test so the mlx4 modules are loaded. I unloaded mlx4_ib, no luck. I
>
> also tried to unload the mlx5_ib driver and it also unloaded mlx5_core
>
> and my interfaces were gone. It seems like I can't only unload
>
> mlx5_ib.
>
>
> With mlx4_ib unloaded I still can't rping or ib_read_bw (connects, but
>
> get messages like:
>
> ethernet_read_keys: Couldn't read remote address
>
> Unable to read to socket/rdam_cm
>
> Failed to exchange data between server and clients
>
> Problems with warm up) same as before.
>
> ----------------
>
> Robert LeBlanc
>
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
>
> On Thu, Nov 3, 2016 at 11:16 AM, Parav Pandit <pandit.parav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
>
> Hi Robert,
>
>
> Can you please unload the mlx4_ib module in the bridge/router box and
>
> give it a quick try?
>
>
> Parav
>
>
> On Thu, Nov 3, 2016 at 10:32 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
> wrote:
>
> I'm trying to do some testing of RoCE v2 and so I put a LInux box
>
> between two RoCE  machines. I think the ConnectX-4 Lx card in the
>
> bridge/router is intercepting the RoCE traffic and so it is not being
>
> bridged/routed. I don't see any traffic using tcpdump which seems to
>
> confirm this. I thought I could change the UDP port that the card is
>
> looking for RoCE traffic to something not in use [0], but rr_proto is
>
> not a valid parameter for the inbox mlx5_core module on 4.8.5. I can
>
> ping across the bridge/router so I know that it is setup correctly,
>
> just RDMA is not working.
>
>
> Any ideas on how to pass RoCE traffic like normal traffic? The reason
>
> we are using a Linux box is that we can use netem to understand how
>
> RoCE behaves in different situations.
>
>
> [0] https://community.mellanox.com/docs/DOC-1444
>
>
> Thank you
>
> ----------------
>
> Robert LeBlanc
>
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
> --
>
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 0/8] libpvrdma: userspace library for PVRDMA
From: Adit Ranadive @ 2016-11-11 19:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, pv-drivers
In-Reply-To: <20161109175142.GA13467-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Wed, Nov 09, 2016 at 10:51:42AM -0700, Jason Gunthorpe wrote:
> On Wed, Nov 09, 2016 at 09:39:49AM -0800, Adit Ranadive wrote:
> > On Tue, Nov 08, 2016 at 5:17:15PM -0800, Christoph Hellwig wrote:
> > > FYI, the convention used in scsi is vmw_pvscsi.c, so naming the
> > > RDMA equivalent vmw_pvrdma would make a lot of sense and still
> > > be reasonably short.
> > 
> > Thanks Christoph. We agree with that as well. Then the driver path
> > (drivers/infiniband/hw/vmw_pvrdma/...) and the module name would
> > reflect vmw_pvrdma.
> > What about the user library? libpvrdma -> libvmwpvrdma?
> 
> I am encouraging people to match the name of the module in the
> library.
>
> Jason
> 

Hi Doug,

We would like to get the driver in 4.10 if its possible. I can 
respin the patches (with the new driver name/location) based on your
for-4.10 branch or another branch if you prefer.

Thanks,
Adit
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR
From: Julia Lawall @ 2016-11-11 19:04 UTC (permalink / raw)
  To: Christian Benvenuti
  Cc: kernel-janitors, Dave Goodell, Doug Ledford, Sean Hefty,
	Hal Rosenstock, linux-rdma, linux-kernel, Christophe JAILLET

The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a
valid pointer, never NULL.  The same is true of get_qp_res_chunk, which
just returns the result of calling usnic_ib_qp_grp_get_chunk.  Simplify
IS_ERR_OR_NULL to IS_ERR in both cases.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,e;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ IS_ERR(t)

@@
expression t,e,e1;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ PTR_ERR(t)
... when any
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c |   12 ++++++------
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |   12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c b/drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c
index 0e813ec..092d4e1 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c
@@ -117,10 +117,10 @@ static int enable_qp_grp(struct usnic_ib_qp_grp *qp_grp)
 	vnic_idx = usnic_vnic_get_index(qp_grp->vf->vnic);
 
 	res_chunk = get_qp_res_chunk(qp_grp);
-	if (IS_ERR_OR_NULL(res_chunk)) {
+	if (IS_ERR(res_chunk)) {
 		usnic_err("Unable to get qp res with err %ld\n",
 				PTR_ERR(res_chunk));
-		return res_chunk ? PTR_ERR(res_chunk) : -ENOMEM;
+		return PTR_ERR(res_chunk);
 	}
 
 	for (i = 0; i < res_chunk->cnt; i++) {
@@ -158,10 +158,10 @@ static int disable_qp_grp(struct usnic_ib_qp_grp *qp_grp)
 	vnic_idx = usnic_vnic_get_index(qp_grp->vf->vnic);
 
 	res_chunk = get_qp_res_chunk(qp_grp);
-	if (IS_ERR_OR_NULL(res_chunk)) {
+	if (IS_ERR(res_chunk)) {
 		usnic_err("Unable to get qp res with err %ld\n",
 			PTR_ERR(res_chunk));
-		return res_chunk ? PTR_ERR(res_chunk) : -ENOMEM;
+		return PTR_ERR(res_chunk);
 	}
 
 	for (i = 0; i < res_chunk->cnt; i++) {
@@ -186,11 +186,11 @@ static int init_filter_action(struct usnic_ib_qp_grp *qp_grp,
 	struct usnic_vnic_res_chunk *res_chunk;
 
 	res_chunk = usnic_ib_qp_grp_get_chunk(qp_grp, USNIC_VNIC_RES_TYPE_RQ);
-	if (IS_ERR_OR_NULL(res_chunk)) {
+	if (IS_ERR(res_chunk)) {
 		usnic_err("Unable to get %s with err %ld\n",
 			usnic_vnic_res_type_to_str(USNIC_VNIC_RES_TYPE_RQ),
 			PTR_ERR(res_chunk));
-		return res_chunk ? PTR_ERR(res_chunk) : -ENOMEM;
+		return PTR_ERR(res_chunk);
 	}
 
 	uaction->vnic_idx = usnic_vnic_get_index(qp_grp->vf->vnic);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index a5bfbba..79766db 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -87,12 +87,12 @@ static int usnic_ib_fill_create_qp_resp(struct usnic_ib_qp_grp *qp_grp,
 	resp.bar_len = bar->len;
 
 	chunk = usnic_ib_qp_grp_get_chunk(qp_grp, USNIC_VNIC_RES_TYPE_RQ);
-	if (IS_ERR_OR_NULL(chunk)) {
+	if (IS_ERR(chunk)) {
 		usnic_err("Failed to get chunk %s for qp_grp %d with err %ld\n",
 			usnic_vnic_res_type_to_str(USNIC_VNIC_RES_TYPE_RQ),
 			qp_grp->grp_id,
 			PTR_ERR(chunk));
-		return chunk ? PTR_ERR(chunk) : -ENOMEM;
+		return PTR_ERR(chunk);
 	}
 
 	WARN_ON(chunk->type != USNIC_VNIC_RES_TYPE_RQ);
@@ -101,12 +101,12 @@ static int usnic_ib_fill_create_qp_resp(struct usnic_ib_qp_grp *qp_grp,
 		resp.rq_idx[i] = chunk->res[i]->vnic_idx;
 
 	chunk = usnic_ib_qp_grp_get_chunk(qp_grp, USNIC_VNIC_RES_TYPE_WQ);
-	if (IS_ERR_OR_NULL(chunk)) {
+	if (IS_ERR(chunk)) {
 		usnic_err("Failed to get chunk %s for qp_grp %d with err %ld\n",
 			usnic_vnic_res_type_to_str(USNIC_VNIC_RES_TYPE_WQ),
 			qp_grp->grp_id,
 			PTR_ERR(chunk));
-		return chunk ? PTR_ERR(chunk) : -ENOMEM;
+		return PTR_ERR(chunk);
 	}
 
 	WARN_ON(chunk->type != USNIC_VNIC_RES_TYPE_WQ);
@@ -115,12 +115,12 @@ static int usnic_ib_fill_create_qp_resp(struct usnic_ib_qp_grp *qp_grp,
 		resp.wq_idx[i] = chunk->res[i]->vnic_idx;
 
 	chunk = usnic_ib_qp_grp_get_chunk(qp_grp, USNIC_VNIC_RES_TYPE_CQ);
-	if (IS_ERR_OR_NULL(chunk)) {
+	if (IS_ERR(chunk)) {
 		usnic_err("Failed to get chunk %s for qp_grp %d with err %ld\n",
 			usnic_vnic_res_type_to_str(USNIC_VNIC_RES_TYPE_CQ),
 			qp_grp->grp_id,
 			PTR_ERR(chunk));
-		return chunk ? PTR_ERR(chunk) : -ENOMEM;
+		return PTR_ERR(chunk);
 	}
 
 	WARN_ON(chunk->type != USNIC_VNIC_RES_TYPE_CQ);

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox