* [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Salil Mehta @ 2016-11-04 16:36 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
netdev, linux-kernel, linuxarm, Ping Zhang
In-Reply-To: <20161104163633.141880-1-salil.mehta@huawei.com>
From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
This patch modified the logic of allocating memory using APIs in
hns RoCE driver. We used kcalloc instead of kmalloc_array and
bitmap_zero. And When kcalloc failed, call vzalloc to alloc
memory.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Ping Zhang <zhangping5@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_mr.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index fb87883..d3dfb5f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct hns_roce_buddy *buddy, int max_order)
for (i = 0; i <= buddy->max_order; ++i) {
s = BITS_TO_LONGS(1 << (buddy->max_order - i));
- buddy->bits[i] = kmalloc_array(s, sizeof(long), GFP_KERNEL);
- if (!buddy->bits[i])
- goto err_out_free;
-
- bitmap_zero(buddy->bits[i], 1 << (buddy->max_order - i));
+ buddy->bits[i] = kcalloc(s, sizeof(long), GFP_KERNEL);
+ if (!buddy->bits[i]) {
+ buddy->bits[i] = vzalloc(s * sizeof(long));
+ if (!buddy->bits[i])
+ goto err_out_free;
+ }
}
set_bit(0, buddy->bits[buddy->max_order]);
@@ -151,7 +152,7 @@ static int hns_roce_buddy_init(struct hns_roce_buddy *buddy, int max_order)
err_out_free:
for (i = 0; i <= buddy->max_order; ++i)
- kfree(buddy->bits[i]);
+ kvfree(buddy->bits[i]);
err_out:
kfree(buddy->bits);
@@ -164,7 +165,7 @@ static void hns_roce_buddy_cleanup(struct hns_roce_buddy *buddy)
int i;
for (i = 0; i <= buddy->max_order; ++i)
- kfree(buddy->bits[i]);
+ kvfree(buddy->bits[i]);
kfree(buddy->bits);
kfree(buddy->num_free);
--
1.7.9.5
^ permalink raw reply related
* [PATCH for-next 02/11] IB/hns: Add code for refreshing CQ CI using TPTR
From: Salil Mehta @ 2016-11-04 16:36 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
netdev, linux-kernel, linuxarm, Dongdong Huang
In-Reply-To: <20161104163633.141880-1-salil.mehta@huawei.com>
From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
This patch added the code for refreshing CQ CI using TPTR in hip06
SoC.
We will send a doorbell to hardware for refreshing CQ CI when user
succeed to poll a cqe. But it will be failed if the doorbell has
been blocked. So hardware will read a special buffer called TPTR
to get the lastest CI value when the cq is almost full.
This patch support the special CI buffer as follows:
a) Alloc the memory for TPTR in the hns_roce_tptr_init function and
free it in hns_roce_tptr_free function, these two functions will
be called in probe function and in the remove function.
b) Add the code for computing offset(every cq need 2 bytes) and
write the dma addr to every cq context to notice hardware in the
function named hns_roce_v1_write_cqc.
c) Add code for mapping TPTR buffer to user space in function named
hns_roce_mmap. The mapping distinguish TPTR and UAR of user mode
by vm_pgoff(0: UAR, 1: TPTR, others:invaild) in hip06.
d) Alloc the code for refreshing CQ CI using TPTR in the function
named hns_roce_v1_poll_cq.
e) Add some variable definitions to the related structure.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_common.h | 2 -
drivers/infiniband/hw/hns/hns_roce_cq.c | 9 +++
drivers/infiniband/hw/hns/hns_roce_device.h | 6 +-
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 79 ++++++++++++++++++++++++---
drivers/infiniband/hw/hns/hns_roce_hw_v1.h | 9 +++
drivers/infiniband/hw/hns/hns_roce_main.c | 13 ++++-
6 files changed, 103 insertions(+), 15 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h b/drivers/infiniband/hw/hns/hns_roce_common.h
index 2970161..0dcb620 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -253,8 +253,6 @@
#define ROCEE_VENDOR_ID_REG 0x0
#define ROCEE_VENDOR_PART_ID_REG 0x4
-#define ROCEE_HW_VERSION_REG 0x8
-
#define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
#define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 0973659..5dc8d92 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -349,6 +349,15 @@ struct ib_cq *hns_roce_ib_create_cq(struct ib_device *ib_dev,
goto err_mtt;
}
+ /*
+ * For the QP created by kernel space, tptr value should be initialized
+ * to zero; For the QP created by user space, it will cause synchronous
+ * problems if tptr is set to zero here, so we initialze it in user
+ * space.
+ */
+ if (!context)
+ *hr_cq->tptr_addr = 0;
+
/* Get created cq handler and carry out event */
hr_cq->comp = hns_roce_ib_cq_comp;
hr_cq->event = hns_roce_ib_cq_event;
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..7242b14 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -37,6 +37,8 @@
#define DRV_NAME "hns_roce"
+#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
#define MAC_ADDR_OCTET_NUM 6
#define HNS_ROCE_MAX_MSG_LEN 0x80000000
@@ -296,7 +298,7 @@ struct hns_roce_cq {
u32 cq_depth;
u32 cons_index;
void __iomem *cq_db_l;
- void __iomem *tptr_addr;
+ u16 *tptr_addr;
unsigned long cqn;
u32 vector;
atomic_t refcount;
@@ -553,6 +555,8 @@ struct hns_roce_dev {
int cmd_mod;
int loop_idc;
+ dma_addr_t tptr_dma_addr; /*only for hw v1*/
+ u32 tptr_size; /*only for hw v1*/
struct hns_roce_hw *hw;
};
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index ca8b784..7750d0d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -849,6 +849,45 @@ static void hns_roce_bt_free(struct hns_roce_dev *hr_dev)
priv->bt_table.qpc_buf.buf, priv->bt_table.qpc_buf.map);
}
+static int hns_roce_tptr_init(struct hns_roce_dev *hr_dev)
+{
+ struct device *dev = &hr_dev->pdev->dev;
+ struct hns_roce_buf_list *tptr_buf;
+ struct hns_roce_v1_priv *priv;
+
+ priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+ tptr_buf = &priv->tptr_table.tptr_buf;
+
+ /*
+ * This buffer will be used for CQ's tptr(tail pointer), also
+ * named ci(customer index). Every CQ will use 2 bytes to save
+ * cqe ci in hip06. Hardware will read this area to get new ci
+ * when the queue is almost full.
+ */
+ tptr_buf->buf = dma_alloc_coherent(dev, HNS_ROCE_V1_TPTR_BUF_SIZE,
+ &tptr_buf->map, GFP_KERNEL);
+ if (!tptr_buf->buf)
+ return -ENOMEM;
+
+ hr_dev->tptr_dma_addr = tptr_buf->map;
+ hr_dev->tptr_size = HNS_ROCE_V1_TPTR_BUF_SIZE;
+
+ return 0;
+}
+
+static void hns_roce_tptr_free(struct hns_roce_dev *hr_dev)
+{
+ struct device *dev = &hr_dev->pdev->dev;
+ struct hns_roce_buf_list *tptr_buf;
+ struct hns_roce_v1_priv *priv;
+
+ priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+ tptr_buf = &priv->tptr_table.tptr_buf;
+
+ dma_free_coherent(dev, HNS_ROCE_V1_TPTR_BUF_SIZE,
+ tptr_buf->buf, tptr_buf->map);
+}
+
/**
* hns_roce_v1_reset - reset RoCE
* @hr_dev: RoCE device struct pointer
@@ -906,12 +945,11 @@ void hns_roce_v1_profile(struct hns_roce_dev *hr_dev)
hr_dev->vendor_id = le32_to_cpu(roce_read(hr_dev, ROCEE_VENDOR_ID_REG));
hr_dev->vendor_part_id = le32_to_cpu(roce_read(hr_dev,
ROCEE_VENDOR_PART_ID_REG));
- hr_dev->hw_rev = le32_to_cpu(roce_read(hr_dev, ROCEE_HW_VERSION_REG));
-
hr_dev->sys_image_guid = le32_to_cpu(roce_read(hr_dev,
ROCEE_SYS_IMAGE_GUID_L_REG)) |
((u64)le32_to_cpu(roce_read(hr_dev,
ROCEE_SYS_IMAGE_GUID_H_REG)) << 32);
+ hr_dev->hw_rev = HNS_ROCE_HW_VER1;
caps->num_qps = HNS_ROCE_V1_MAX_QP_NUM;
caps->max_wqes = HNS_ROCE_V1_MAX_WQE_NUM;
@@ -1009,8 +1047,17 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
goto error_failed_bt_init;
}
+ ret = hns_roce_tptr_init(hr_dev);
+ if (ret) {
+ dev_err(dev, "tptr init failed!\n");
+ goto error_failed_tptr_init;
+ }
+
return 0;
+error_failed_tptr_init:
+ hns_roce_bt_free(hr_dev);
+
error_failed_bt_init:
hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
hns_roce_raq_free(hr_dev);
@@ -1022,6 +1069,7 @@ int hns_roce_v1_init(struct hns_roce_dev *hr_dev)
void hns_roce_v1_exit(struct hns_roce_dev *hr_dev)
{
+ hns_roce_tptr_free(hr_dev);
hns_roce_bt_free(hr_dev);
hns_roce_port_enable(hr_dev, HNS_ROCE_PORT_DOWN);
hns_roce_raq_free(hr_dev);
@@ -1339,14 +1387,21 @@ void hns_roce_v1_write_cqc(struct hns_roce_dev *hr_dev,
dma_addr_t dma_handle, int nent, u32 vector)
{
struct hns_roce_cq_context *cq_context = NULL;
- void __iomem *tptr_addr;
+ struct hns_roce_buf_list *tptr_buf;
+ struct hns_roce_v1_priv *priv;
+ dma_addr_t tptr_dma_addr;
+ int offset;
+
+ priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+ tptr_buf = &priv->tptr_table.tptr_buf;
cq_context = mb_buf;
memset(cq_context, 0, sizeof(*cq_context));
- tptr_addr = 0;
- hr_dev->priv_addr = tptr_addr;
- hr_cq->tptr_addr = tptr_addr;
+ /* Get the tptr for this CQ. */
+ offset = hr_cq->cqn * HNS_ROCE_V1_TPTR_ENTRY_SIZE;
+ tptr_dma_addr = tptr_buf->map + offset;
+ hr_cq->tptr_addr = (u16 *)(tptr_buf->buf + offset);
/* Register cq_context members */
roce_set_field(cq_context->cqc_byte_4,
@@ -1390,10 +1445,10 @@ void hns_roce_v1_write_cqc(struct hns_roce_dev *hr_dev,
roce_set_field(cq_context->cqc_byte_20,
CQ_CONTEXT_CQC_BYTE_20_CQE_TPTR_ADDR_H_M,
CQ_CONTEXT_CQC_BYTE_20_CQE_TPTR_ADDR_H_S,
- (u64)tptr_addr >> 44);
+ tptr_dma_addr >> 44);
cq_context->cqc_byte_20 = cpu_to_le32(cq_context->cqc_byte_20);
- cq_context->cqe_tptr_addr_l = (u32)((u64)tptr_addr >> 12);
+ cq_context->cqe_tptr_addr_l = (u32)(tptr_dma_addr >> 12);
roce_set_field(cq_context->cqc_byte_32,
CQ_CONTEXT_CQC_BYTE_32_CUR_CQE_BA1_H_M,
@@ -1659,8 +1714,14 @@ int hns_roce_v1_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
break;
}
- if (npolled)
+ if (npolled) {
+ *hr_cq->tptr_addr = hr_cq->cons_index &
+ ((hr_cq->cq_depth << 1) - 1);
+
+ /* Memroy barrier */
+ wmb();
hns_roce_v1_cq_set_ci(hr_cq, hr_cq->cons_index);
+ }
spin_unlock_irqrestore(&hr_cq->lock, flags);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index 2e1878b..6004c7f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -104,6 +104,10 @@
#define HNS_ROCE_BT_RSV_BUF_SIZE (1 << 17)
+#define HNS_ROCE_V1_TPTR_ENTRY_SIZE 2
+#define HNS_ROCE_V1_TPTR_BUF_SIZE \
+ (HNS_ROCE_V1_TPTR_ENTRY_SIZE * HNS_ROCE_V1_MAX_CQ_NUM)
+
#define HNS_ROCE_ODB_POLL_MODE 0
#define HNS_ROCE_SDB_NORMAL_MODE 0
@@ -983,10 +987,15 @@ struct hns_roce_bt_table {
struct hns_roce_buf_list cqc_buf;
};
+struct hns_roce_tptr_table {
+ struct hns_roce_buf_list tptr_buf;
+};
+
struct hns_roce_v1_priv {
struct hns_roce_db_table db_table;
struct hns_roce_raq_table raq_table;
struct hns_roce_bt_table bt_table;
+ struct hns_roce_tptr_table tptr_table;
};
int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset);
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 764e35a..6770171 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -549,6 +549,8 @@ static int hns_roce_dealloc_ucontext(struct ib_ucontext *ibcontext)
static int hns_roce_mmap(struct ib_ucontext *context,
struct vm_area_struct *vma)
{
+ struct hns_roce_dev *hr_dev = to_hr_dev(context->device);
+
if (((vma->vm_end - vma->vm_start) % PAGE_SIZE) != 0)
return -EINVAL;
@@ -558,10 +560,15 @@ static int hns_roce_mmap(struct ib_ucontext *context,
to_hr_ucontext(context)->uar.pfn,
PAGE_SIZE, vma->vm_page_prot))
return -EAGAIN;
-
- } else {
+ } else if (vma->vm_pgoff == 1 && hr_dev->hw_rev == HNS_ROCE_HW_VER1) {
+ /* vm_pgoff: 1 -- TPTR */
+ if (io_remap_pfn_range(vma, vma->vm_start,
+ hr_dev->tptr_dma_addr >> PAGE_SHIFT,
+ hr_dev->tptr_size,
+ vma->vm_page_prot))
+ return -EAGAIN;
+ } else
return -EINVAL;
- }
return 0;
}
--
1.7.9.5
^ permalink raw reply related
* [PATCH for-next 01/11] IB/hns: Add the interface for querying QP1
From: Salil Mehta @ 2016-11-04 16:36 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
netdev, linux-kernel, linuxarm
In-Reply-To: <20161104163633.141880-1-salil.mehta@huawei.com>
From: Lijun Ou <oulijun@huawei.com>
In old code, It only added the interface for querying non-specific
QP. This patch mainly adds an interface for querying QP1.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 87 +++++++++++++++++++++++++++-
drivers/infiniband/hw/hns/hns_roce_hw_v1.h | 6 +-
2 files changed, 90 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 71232e5..ca8b784 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2630,8 +2630,82 @@ static int hns_roce_v1_query_qpc(struct hns_roce_dev *hr_dev,
return ret;
}
-int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
- int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr)
+static int hns_roce_v1_q_sqp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+ int qp_attr_mask,
+ struct ib_qp_init_attr *qp_init_attr)
+{
+ struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
+ struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+ struct hns_roce_sqp_context *context;
+ u32 addr;
+
+ context = kzalloc(sizeof(*context), GFP_KERNEL);
+ if (!context)
+ return -ENOMEM;
+
+ mutex_lock(&hr_qp->mutex);
+
+ if (hr_qp->state == IB_QPS_RESET) {
+ qp_attr->qp_state = IB_QPS_RESET;
+ goto done;
+ }
+
+ addr = ROCEE_QP1C_CFG0_0_REG + hr_qp->port * sizeof(*context);
+ context->qp1c_bytes_4 = roce_read(hr_dev, addr);
+ context->sq_rq_bt_l = roce_read(hr_dev, addr + 1);
+ context->qp1c_bytes_12 = roce_read(hr_dev, addr + 2);
+ context->qp1c_bytes_16 = roce_read(hr_dev, addr + 3);
+ context->qp1c_bytes_20 = roce_read(hr_dev, addr + 4);
+ context->cur_rq_wqe_ba_l = roce_read(hr_dev, addr + 5);
+ context->qp1c_bytes_28 = roce_read(hr_dev, addr + 6);
+ context->qp1c_bytes_32 = roce_read(hr_dev, addr + 7);
+ context->cur_sq_wqe_ba_l = roce_read(hr_dev, addr + 8);
+ context->qp1c_bytes_40 = roce_read(hr_dev, addr + 9);
+
+ hr_qp->state = roce_get_field(context->qp1c_bytes_4,
+ QP1C_BYTES_4_QP_STATE_M,
+ QP1C_BYTES_4_QP_STATE_S);
+ qp_attr->qp_state = hr_qp->state;
+ qp_attr->path_mtu = IB_MTU_256;
+ qp_attr->path_mig_state = IB_MIG_ARMED;
+ qp_attr->qkey = QKEY_VAL;
+ qp_attr->rq_psn = 0;
+ qp_attr->sq_psn = 0;
+ qp_attr->dest_qp_num = 1;
+ qp_attr->qp_access_flags = 6;
+
+ qp_attr->pkey_index = roce_get_field(context->qp1c_bytes_20,
+ QP1C_BYTES_20_PKEY_IDX_M,
+ QP1C_BYTES_20_PKEY_IDX_S);
+ qp_attr->port_num = hr_qp->port + 1;
+ qp_attr->sq_draining = 0;
+ qp_attr->max_rd_atomic = 0;
+ qp_attr->max_dest_rd_atomic = 0;
+ qp_attr->min_rnr_timer = 0;
+ qp_attr->timeout = 0;
+ qp_attr->retry_cnt = 0;
+ qp_attr->rnr_retry = 0;
+ qp_attr->alt_timeout = 0;
+
+done:
+ qp_attr->cur_qp_state = qp_attr->qp_state;
+ qp_attr->cap.max_recv_wr = hr_qp->rq.wqe_cnt;
+ qp_attr->cap.max_recv_sge = hr_qp->rq.max_gs;
+ qp_attr->cap.max_send_wr = hr_qp->sq.wqe_cnt;
+ qp_attr->cap.max_send_sge = hr_qp->sq.max_gs;
+ qp_attr->cap.max_inline_data = 0;
+ qp_init_attr->cap = qp_attr->cap;
+ qp_init_attr->create_flags = 0;
+
+ mutex_unlock(&hr_qp->mutex);
+ kfree(context);
+
+ return 0;
+}
+
+static int hns_roce_v1_q_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+ int qp_attr_mask,
+ struct ib_qp_init_attr *qp_init_attr)
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
@@ -2767,6 +2841,15 @@ int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
return ret;
}
+int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+ int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr)
+{
+ struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+
+ return hr_qp->doorbell_qpn <= 1 ?
+ hns_roce_v1_q_sqp(ibqp, qp_attr, qp_attr_mask, qp_init_attr) :
+ hns_roce_v1_q_qp(ibqp, qp_attr, qp_attr_mask, qp_init_attr);
+}
static void hns_roce_v1_destroy_qp_common(struct hns_roce_dev *hr_dev,
struct hns_roce_qp *hr_qp,
int is_user)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index 539b0a3b..2e1878b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -480,13 +480,17 @@ struct hns_roce_sqp_context {
u32 qp1c_bytes_12;
u32 qp1c_bytes_16;
u32 qp1c_bytes_20;
- u32 qp1c_bytes_28;
u32 cur_rq_wqe_ba_l;
+ u32 qp1c_bytes_28;
u32 qp1c_bytes_32;
u32 cur_sq_wqe_ba_l;
u32 qp1c_bytes_40;
};
+#define QP1C_BYTES_4_QP_STATE_S 0
+#define QP1C_BYTES_4_QP_STATE_M \
+ (((1UL << 3) - 1) << QP1C_BYTES_4_QP_STATE_S)
+
#define QP1C_BYTES_4_SQ_WQE_SHIFT_S 8
#define QP1C_BYTES_4_SQ_WQE_SHIFT_M \
(((1UL << 4) - 1) << QP1C_BYTES_4_SQ_WQE_SHIFT_S)
--
1.7.9.5
^ permalink raw reply related
* [PATCH for-next 00/11] Code improvements & fixes for HNS RoCE driver
From: Salil Mehta @ 2016-11-04 16:36 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
netdev, linux-kernel, linuxarm
This patchset introduces some code improvements and fixes
for the identified problems in the HNS RoCE driver.
Lijun Ou (4):
IB/hns: Add the interface for querying QP1
IB/hns: add self loopback for CM
IB/hns: Modify the condition of notifying hardware loopback
IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()
Salil Mehta (1):
IB/hns: Fix for Checkpatch.pl comment style errors
Shaobo Xu (1):
IB/hns: Implement the add_gid/del_gid and optimize the GIDs
management
Wei Hu (Xavier) (5):
IB/hns: Add code for refreshing CQ CI using TPTR
IB/hns: Optimize the logic of allocating memory using APIs
IB/hns: Modify the macro for the timeout when cmd process
IB/hns: Modify query info named port_num when querying RC QP
IB/hns: Change qpn allocation to round-robin mode.
drivers/infiniband/hw/hns/hns_roce_alloc.c | 11 +-
drivers/infiniband/hw/hns/hns_roce_cmd.c | 8 +-
drivers/infiniband/hw/hns/hns_roce_cmd.h | 7 +-
drivers/infiniband/hw/hns/hns_roce_common.h | 2 -
drivers/infiniband/hw/hns/hns_roce_cq.c | 17 +-
drivers/infiniband/hw/hns/hns_roce_device.h | 45 ++--
drivers/infiniband/hw/hns/hns_roce_eq.c | 6 +-
drivers/infiniband/hw/hns/hns_roce_hem.c | 6 +-
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 271 +++++++++++++++++------
drivers/infiniband/hw/hns/hns_roce_hw_v1.h | 17 +-
drivers/infiniband/hw/hns/hns_roce_main.c | 311 +++++++--------------------
drivers/infiniband/hw/hns/hns_roce_mr.c | 21 +-
drivers/infiniband/hw/hns/hns_roce_pd.c | 5 +-
drivers/infiniband/hw/hns/hns_roce_qp.c | 2 +-
14 files changed, 367 insertions(+), 362 deletions(-)
--
1.7.9.5
^ permalink raw reply
* Crash in mlx4 shutdown with 4.9-rc3
From: Steve Wise @ 2016-11-04 14:29 UTC (permalink / raw)
To: yishaih-VPRAkNaXOzVWk0Htik3J/w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hey Yishai, Is this by chance a known bug having a pending fix somewhere? I'm
seeing it frequently when shutting down. I'm using 4.9-rc3 with memory
debugging enabled...
[59984.502834] mlx4_core 0000:81:00.0: mlx4_shutdown was called
[59984.603599] mlx4_en 0000:81:00.0: removed PHC
[59985.145590] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[59985.151990] Modules linked in: uio_pci_generic uio iw_cxgb4 cxgb4 nvmet_rdma
nvmet null_blk brd rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib
rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash
dm_log dm_mod intel_rapl iosf_mbi sb_edac edac_core x86_pkg_temp_thermal
coretemp ext4 kvm jbd2 irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel mbcache aesni_intel lrw gf128mul iTCO_wdt glue_helper mei_me
iTCO_vendor_support ablk_helper cryptd mxm_wmi ipmi_si i2c_i801 lpc_ich mei sg
nfsd mfd_core i2c_smbus ipmi_msghandler pcspkr shpchp auth_rpcgss wmi nfs_acl
lockd grace sunrpc ip_tables xfs libcrc32c libcxgb mlx4_ib ib_core mlx4_en
sd_mod drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm
mlx4_core igb drm ahci libahci ptp libata crc32c_intel pps_core dca nvme
i2c_algo_bit nvme_core i2c_core [last unloaded: cxgb4]
[59985.239258] CPU: 30 PID: 10937 Comm: kworker/30:1 Not tainted
4.9.0-rc3-debug+ #2
[59985.246992] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[59985.254098] Workqueue: events linkwatch_event
[59985.258600] task: ffff88105312c6c0 task.stack: ffffc90020204000
[59985.264657] RIP: 0010:[<ffffffffa05ae1ba>] [<ffffffffa05ae1ba>]
mlx4_en_get_phys_port_id+0x1a/0x50 [mlx4_en]
[59985.274874] RSP: 0018:ffffc90020207c30 EFLAGS: 00010286
[59985.280312] RAX: 6b6b6b6b6b6b6b6b RBX: ffff881048c220c0 RCX: 0000000000000000
[59985.287582] RDX: 0000000000000001 RSI: ffffc90020207cb0 RDI: ffff881037020000
[59985.294844] RBP: ffffc90020207c30 R08: 00000000000005f0 R09: ffff88102017e752
[59985.302100] R10: ffff88085f4090c0 R11: ffff88102017e678 R12: ffff881037020000
[59985.309356] R13: ffff88102017e678 R14: 0000000000000000 R15: 0000000000000000
[59985.316608] FS: 0000000000000000(0000) GS:ffff881057580000(0000)
knlGS:0000000000000000
[59985.324936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[59985.330805] CR2: 00007fff8fd82ff8 CR3: 0000000001c07000 CR4: 00000000000406e0
[59985.338072] Stack:
[59985.340219] ffffc90020207c40 ffffffff81587a6e ffffc90020207d00
ffffffff815a36ce
[59985.347950] ffff881048c220c0 ffffc90020207cd7 0000000000000000
0000000000000010
[59985.355684] 02000000ffffffff 000003e820000000 00000000000005dc
0000010000000000
[59985.363408] Call Trace:
[59985.365994] [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
[59985.372123] [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
[59985.378076] [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
[59985.384377] [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
[59985.390505] [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
[59985.395940] [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
[59985.401983] [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
[59985.407764] [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
[59985.414067] [<ffffffff815a6205>] linkwatch_event+0x25/0x30
[59985.419764] [<ffffffff81099a82>] process_one_work+0x152/0x400
[59985.425716] [<ffffffff8109a325>] worker_thread+0x125/0x4b0
[59985.431409] [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
[59985.437366] [<ffffffff8109fc6a>] kthread+0xca/0xe0
[59985.442367] [<ffffffff8109fba0>] ? kthread_park+0x60/0x60
[59985.447978] [<ffffffff816a1285>] ret_from_fork+0x25/0x30
[59985.453497] Code: f0 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
66 90 55 48 8b 87 c0 08 00 00 48 63 97 9c d5 00 00 48 89 e5 48 8b 00 <48> 8b 94
d0 58 02 00 00 48 85 d2 74 1c c6 46 20 08 31 c0 88 54
[59985.474081] RIP [<ffffffffa05ae1ba>] mlx4_en_get_phys_port_id+0x1a/0x50
[mlx4_en]
[59985.481915] RSP <ffffc90020207c30>
[59985.485910] ---[ end trace 317937c8890959b8 ]---
[59990.228721] Kernel panic - not syncing: Fatal exception
[59990.234181] Kernel Offset: disabled
[59990.239944] ---[ end Kernel panic - not syncing: Fatal exception
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-04 5:44 UTC (permalink / raw)
To: Liran Liss
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <AM4PR0501MB28025BE002CBA9D04675A5A5B1A20-dp/nxUn679jTOi/YP668sMDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
Hi Liran,
On Fri, Nov 4, 2016 at 10:36 AM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> From: Parav Pandit [mailto:pandit.parav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
>
>>
>> On Fri, Nov 4, 2016 at 10:22 AM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> >> From: Parav Pandit [mailto:pandit.parav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
>> >
>> >> >
>> >> > A global HCA metric is indeed in the right direction.
>> >> > However, rethinking this, I think that we should specify the metric
>> >> > in terms of
>> >> RDMA objects rather than percentage.
>> >> > Basically, any resource that consumes an IDR is charged.
>> >> >
>> >> If metric definition is based on RDMA objects (count) and not based
>> >> on percentage, how would user specify the metric without really
>> >> specifying object type.
>> >> Current patch defines the metric as absolute numbers and objects as well.
>> >>
>> >
>> > That is the requested change. The absolute number would account for any
>> object allocation. We won't distinguish between types.
>> > Only a single counter (per device).
>> >
>>
>> In that case ucontext deserve a additional count. Because that is handful in
>> range of 256 to 1K.
>> If we give absolute consolidated number as 2000, one container will allocate all
>> the doorbell uctx and no other container can run.
>> Percentage works for this particular case.
>>
>
> Hmm..
> I guess that you are right.
>
> So we can add another count for "HCA handles",
I prefer this. This keeps it vendor agnostic and clean if we don't go
percentage route.
Would indirection table also fall in this category?
> or alternatively, each provider will restrict the number of handles per device to a reasonable small number (which
> won't be treated as one of the "HCA resources").
This would require vendor drivers to get the understanding of cgroup
object and pid and that breaks the modular approach. I like to avoid
this.
> Typically, a process shouldn't need to open more than a single handle...
Right. well behaved application won't do multiple handles.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Liran Liss @ 2016-11-04 5:06 UTC (permalink / raw)
To: Parav Pandit
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <CAG53R5UyZPh9wduPZGRg2P09n2Og8oODqb+QW=7ryAPqJDa6Vw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> From: Parav Pandit [mailto:pandit.parav@gmail.com]
>
> On Fri, Nov 4, 2016 at 10:22 AM, Liran Liss <liranl@mellanox.com> wrote:
> >> From: Parav Pandit [mailto:pandit.parav@gmail.com]
> >
> >> >
> >> > A global HCA metric is indeed in the right direction.
> >> > However, rethinking this, I think that we should specify the metric
> >> > in terms of
> >> RDMA objects rather than percentage.
> >> > Basically, any resource that consumes an IDR is charged.
> >> >
> >> If metric definition is based on RDMA objects (count) and not based
> >> on percentage, how would user specify the metric without really
> >> specifying object type.
> >> Current patch defines the metric as absolute numbers and objects as well.
> >>
> >
> > That is the requested change. The absolute number would account for any
> object allocation. We won't distinguish between types.
> > Only a single counter (per device).
> >
>
> In that case ucontext deserve a additional count. Because that is handful in
> range of 256 to 1K.
> If we give absolute consolidated number as 2000, one container will allocate all
> the doorbell uctx and no other container can run.
> Percentage works for this particular case.
>
Hmm..
I guess that you are right.
So we can add another count for "HCA handles", or alternatively, each provider will restrict the number of handles per device to a reasonable small number (which won't be treated as one of the "HCA resources").
Typically, a process shouldn't need to open more than a single handle...
>
> >> Comment from Leon about his discussion with Matan, Tejun, Christoph
> >> says opposite of this for user level configuration.
> >> May be I am missing something.
> >>
> >> > The reasons are:
> >> > - Some HCAs can have a huge amount of resources (millions of
> >> > objects), of
> >> which even a small percentage may consume a considerable amount of
> >> kernel memory.
> >> > - We follow the same notion as FD limits, which accounts for
> >> > numerous resource types that consume file objects in the kernel
> >> > (files, pipes,
> >> > sockets)
> >> > - The namespaces for RDMA resources are large (usually 24 bits). So
> >> > even large resource counts won't come nowhere close in depleting
> >> > the namespace. (Compare that to the mere 64K socket port space...)
> >> > - The metric measures the actual application usage of resources,
> >> > rather than
> >> proportional to the resources of a given HCA adapter.
> >> > - We can continue to use the cgroup mechanism for charging (just as
> >> > in the original proposal)
> >> >
> >> > I have discussed this matter with Doug and Matan, and it seems like
> >> > this is the
> >> right direction.
> >> > --Liran
> >> >
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-04 4:57 UTC (permalink / raw)
To: Liran Liss
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <AM4PR0501MB2802E87F709F41DDEC20B7C9B1A20-dp/nxUn679jTOi/YP668sMDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
On Fri, Nov 4, 2016 at 10:22 AM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> From: Parav Pandit [mailto:pandit.parav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
>
>> >
>> > A global HCA metric is indeed in the right direction.
>> > However, rethinking this, I think that we should specify the metric in terms of
>> RDMA objects rather than percentage.
>> > Basically, any resource that consumes an IDR is charged.
>> >
>> If metric definition is based on RDMA objects (count) and not based on
>> percentage, how would user specify the metric without really specifying object
>> type.
>> Current patch defines the metric as absolute numbers and objects as well.
>>
>
> That is the requested change. The absolute number would account for any object allocation. We won't distinguish between types.
> Only a single counter (per device).
>
In that case ucontext deserve a additional count. Because that is
handful in range of 256 to 1K.
If we give absolute consolidated number as 2000, one container will
allocate all the doorbell uctx and no other container can run.
Percentage works for this particular case.
>> Comment from Leon about his discussion with Matan, Tejun, Christoph says
>> opposite of this for user level configuration.
>> May be I am missing something.
>>
>> > The reasons are:
>> > - Some HCAs can have a huge amount of resources (millions of objects), of
>> which even a small percentage may consume a considerable amount of kernel
>> memory.
>> > - We follow the same notion as FD limits, which accounts for numerous
>> > resource types that consume file objects in the kernel (files, pipes,
>> > sockets)
>> > - The namespaces for RDMA resources are large (usually 24 bits). So
>> > even large resource counts won't come nowhere close in depleting the
>> > namespace. (Compare that to the mere 64K socket port space...)
>> > - The metric measures the actual application usage of resources, rather than
>> proportional to the resources of a given HCA adapter.
>> > - We can continue to use the cgroup mechanism for charging (just as in
>> > the original proposal)
>> >
>> > I have discussed this matter with Doug and Matan, and it seems like this is the
>> right direction.
>> > --Liran
>> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Liran Liss @ 2016-11-04 4:52 UTC (permalink / raw)
To: Parav Pandit
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <CAG53R5Vd58wEBKgAajp9VvJmB5sO2Umii0JE4XaLYKbfrJrxyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1963 bytes --]
> From: Parav Pandit [mailto:pandit.parav@gmail.com]
> >
> > A global HCA metric is indeed in the right direction.
> > However, rethinking this, I think that we should specify the metric in terms of
> RDMA objects rather than percentage.
> > Basically, any resource that consumes an IDR is charged.
> >
> If metric definition is based on RDMA objects (count) and not based on
> percentage, how would user specify the metric without really specifying object
> type.
> Current patch defines the metric as absolute numbers and objects as well.
>
That is the requested change. The absolute number would account for any object allocation. We won't distinguish between types.
Only a single counter (per device).
> Comment from Leon about his discussion with Matan, Tejun, Christoph says
> opposite of this for user level configuration.
> May be I am missing something.
>
> > The reasons are:
> > - Some HCAs can have a huge amount of resources (millions of objects), of
> which even a small percentage may consume a considerable amount of kernel
> memory.
> > - We follow the same notion as FD limits, which accounts for numerous
> > resource types that consume file objects in the kernel (files, pipes,
> > sockets)
> > - The namespaces for RDMA resources are large (usually 24 bits). So
> > even large resource counts won't come nowhere close in depleting the
> > namespace. (Compare that to the mere 64K socket port space...)
> > - The metric measures the actual application usage of resources, rather than
> proportional to the resources of a given HCA adapter.
> > - We can continue to use the cgroup mechanism for charging (just as in
> > the original proposal)
> >
> > I have discussed this matter with Doug and Matan, and it seems like this is the
> right direction.
> > --Liran
> >
N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-04 4:47 UTC (permalink / raw)
To: Liran Liss
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <AM4PR0501MB2802030EE9E359133E04439CB1A20-dp/nxUn679jTOi/YP668sMDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
Hi Liran,
On Fri, Nov 4, 2016 at 9:50 AM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
>
>> We (Tejun, Christoph, Matan and me) had a face-to-face talk during KS/LPC and
>> decided that the best way to move forward is to export to user one object
>> (global HCA like) only and don't export anything else.
>>
>> All internal calculations will be based on this percentage.
>>
>> Once the cgroups users will come with reasonable justification why they need to
>> configure different unexposed objects, we will expose them.
>
> A global HCA metric is indeed in the right direction.
> However, rethinking this, I think that we should specify the metric in terms of RDMA objects rather than percentage.
> Basically, any resource that consumes an IDR is charged.
>
If metric definition is based on RDMA objects (count) and not based on
percentage, how would user specify the metric without really
specifying object type.
Current patch defines the metric as absolute numbers and objects as well.
Comment from Leon about his discussion with Matan, Tejun, Christoph
says opposite of this for user level configuration.
May be I am missing something.
> The reasons are:
> - Some HCAs can have a huge amount of resources (millions of objects), of which even a small percentage may consume a considerable amount of kernel memory.
> - We follow the same notion as FD limits, which accounts for numerous resource types that consume file objects in the kernel (files, pipes, sockets)
> - The namespaces for RDMA resources are large (usually 24 bits). So even large resource counts won't come nowhere close in depleting the namespace. (Compare that to the mere 64K socket port space...)
> - The metric measures the actual application usage of resources, rather than proportional to the resources of a given HCA adapter.
> - We can continue to use the cgroup mechanism for charging (just as in the original proposal)
>
> I have discussed this matter with Doug and Matan, and it seems like this is the right direction.
> --Liran
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-04 4:28 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <20161103180006.GL3617-2ukJVAZIZ/Y@public.gmane.org>
Hi Leon, Christoph, Matan, Tejun,
Thanks for the update.
I need some more information in order to roll out new patch.
Inline clarification below.
On Thu, Nov 3, 2016 at 11:30 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Nov 01, 2016 at 04:33:23PM +0530, Parav Pandit wrote:
>> So my opinion is:
>> (a) Let cgroup define the current standard objects and new reasonable
>> set of vendor specific objects in future.
>> (b) Add new rdma.percentage parameter so that any new standard object
>> or vendor specific object can be abstracted from average end user and
>> applications which are yet to catch up.
>> I believe this takes care of your point (1), (3), (4)?
>
> We (Tejun, Christoph, Matan and me) had a face-to-face talk during
> KS/LPC and decided that the best way to move forward is to export to
> user one object (global HCA like) only and don't export anything else.
>
Can you please confirm the below points to make sure design fits-in.
1. so rdma.current and rdma.max, will show one overall current
percentage used and configured?
(Instead of per object absolute value)
2. As a starting point minimum percentage will be 1%. Default will be 100%.
3. So for example if user has configured 2% of resource, this 2% will
be applicable as 2% of MR, 2% of QP and so on.
4. rdma cgroup continues to do accounting, resource definition as done
in patch_v12.
Though there is provision for defining handful of vendor specific
objects in rdma cgroup, we don't define is currently and therefore
they won't be accounted.
5. In future when such need arise to account vendor specific objects,
they will be added to rdma cgroup.
> All internal calculations will be based on this percentage.
>
> Once the cgroups users will come with reasonable justification why they
> need to configure different unexposed objects, we will expose them.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Liran Liss @ 2016-11-04 4:20 UTC (permalink / raw)
To: Leon Romanovsky, Parav Pandit
Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma, Li Zefan, Johannes Weiner, Doug Ledford,
Christoph Hellwig, Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <20161103180006.GL3617-2ukJVAZIZ/Y@public.gmane.org>
> From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
> We (Tejun, Christoph, Matan and me) had a face-to-face talk during KS/LPC and
> decided that the best way to move forward is to export to user one object
> (global HCA like) only and don't export anything else.
>
> All internal calculations will be based on this percentage.
>
> Once the cgroups users will come with reasonable justification why they need to
> configure different unexposed objects, we will expose them.
A global HCA metric is indeed in the right direction.
However, rethinking this, I think that we should specify the metric in terms of RDMA objects rather than percentage.
Basically, any resource that consumes an IDR is charged.
The reasons are:
- Some HCAs can have a huge amount of resources (millions of objects), of which even a small percentage may consume a considerable amount of kernel memory.
- We follow the same notion as FD limits, which accounts for numerous resource types that consume file objects in the kernel (files, pipes, sockets)
- The namespaces for RDMA resources are large (usually 24 bits). So even large resource counts won't come nowhere close in depleting the namespace. (Compare that to the mere 64K socket port space...)
- The metric measures the actual application usage of resources, rather than proportional to the resources of a given HCA adapter.
- We can continue to use the cgroup mechanism for charging (just as in the original proposal)
I have discussed this matter with Doug and Matan, and it seems like this is the right direction.
--Liran
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Leon Romanovsky @ 2016-11-04 4:20 UTC (permalink / raw)
To: Parav Pandit
Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <20161103180006.GL3617-2ukJVAZIZ/Y@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]
On Thu, Nov 03, 2016 at 08:00:06PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 01, 2016 at 04:33:23PM +0530, Parav Pandit wrote:
> > So my opinion is:
> > (a) Let cgroup define the current standard objects and new reasonable
> > set of vendor specific objects in future.
> > (b) Add new rdma.percentage parameter so that any new standard object
> > or vendor specific object can be abstracted from average end user and
> > applications which are yet to catch up.
> > I believe this takes care of your point (1), (3), (4)?
>
> We (Tejun, Christoph, Matan and me) had a face-to-face talk during
> KS/LPC and decided that the best way to move forward is to export to
> user one object (global HCA like) only and don't export anything else.
>
> All internal calculations will be based on this percentage.
In order to simplify for users and developers more, this global cgroup
object should be not based on percentage, but on actual number of objects
units. While declaration of object unit is object which consumes IDR.
The IDR consumers can be of any type. Such simplification will give
excellent scalability to the cgroup without sacrificing user experience.
>
> Once the cgroups users will come with reasonable justification why they
> need to configure different unexposed objects, we will expose them.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH rdma-core] rxe: Use default dual-license instead of PathScale
From: Leon Romanovsky @ 2016-11-04 4:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, monis-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161104004813.GB30318-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 477 bytes --]
On Thu, Nov 03, 2016 at 06:48:13PM -0600, Jason Gunthorpe wrote:
> On Thu, Nov 03, 2016 at 05:49:15PM +0200, Leon Romanovsky wrote:
> > Remove the patent clauses from RXE copyright notice.
> >
> > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>
> Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
>
> Thanks for getting this addressed!
Thanks
https://github.com/linux-rdma/rdma-core/pull/33
>
> Jason
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Matan Barak @ 2016-11-04 4:12 UTC (permalink / raw)
To: Liran Liss
Cc: Doug Ledford, Christoph Lameter,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
skc-YOWKrPYUwWM@public.gmane.org,
ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
Jason Gunthorpe,
john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, Matan Barak
In-Reply-To: <HE1PR0501MB2812B2B88993AC90E0FBD72AB1A30-692Kmc8YnlIVrnpjwTCbp8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
On Thu, Nov 3, 2016 at 11:54 PM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Matan told me that he will advertise a git with the latest patches applied by EOD.
>
>> -----Original Message-----
>> From: Doug Ledford [mailto:dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
>> Sent: Thursday, November 03, 2016 3:41 PM
>> To: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Cc: skc-YOWKrPYUwWM@public.gmane.org; ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org; Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org; leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org;
>> Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org; Matan Barak
>> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Subject: Re: RDMA developer gatherings around Kernel Summit and Linux
>> Plumbers in Santa Fe
>>
>> On 11/3/16 2:49 PM, Christoph Lameter wrote:
>> >> Saturday sessions 9am till 4pm. 12-1pm Lunchtime
>> >>
>> >> 9am Refine TODO list for consolidated library - Jason Gunthorpe
>> >> 10am Submission process for multi subsystem drivers - Doug Ledford
>> >> 11am Multicast features and gaps - Christoph Lameter
>> >>
>> >> 1pm Licensing carryover - Susan/Christoph
>> >> 2pm Standard network tools, integrating to the regular network
>> stack - Christoph
>> >> 3pm Open Discussion/Reserve Session - TBD
>> >> 4pm Closing Session - TBD
>> >
>> > Ok we have an on going conversation regarding the ioctl and I think
>> > that is of high importance. We tried to find a room for a meeting on
>> > Friday on this but we do not have access to a projector. I would like
>> > to have this issue dealt with first on Saturday and then we can
>> > rearrange times for the other presentations. I could skip some of my
>> > sessions if necessary and we have 2 hours that are pretty flexible at
>> > the end anyways. I hope that is agreeable to everyone?
>> >
>>
>> I'm agreeable with that.
>>
>> --
>> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG Key ID: 0E572FDD
>> Red Hat, Inc.
>> 100 E. Davie St
>> Raleigh, NC 27601 USA
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I would like to clean the series up a bit and make it bisect-able
before converting it from RFC to actual patches. So it's not just
wrapping with CONFIG_EXPERIMENTAL.
In the meantime, you could review the tree in my github:
https://github.com/matanb10/linux branch: abi_rfc_v5
Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH rdma-core] rxe: Use default dual-license instead of PathScale
From: Jason Gunthorpe @ 2016-11-04 0:48 UTC (permalink / raw)
To: Leon Romanovsky
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, monis-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1478188155-24018-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On Thu, Nov 03, 2016 at 05:49:15PM +0200, Leon Romanovsky wrote:
> Remove the patent clauses from RXE copyright notice.
>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Thanks for getting this addressed!
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH rdma-core 0/8] libpvrdma: userspace library for PVRDMA
From: Jason Gunthorpe @ 2016-11-04 0:46 UTC (permalink / raw)
To: Adit Ranadive
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
On Thu, Nov 03, 2016 at 04:44:29PM -0700, Adit Ranadive wrote:
>
> I have included the shared ABI file here based on the RDMA fix up stuff
> that Jason pointed me to.
I left you some trivial notes on github.
The big item is that the shared ABI file must be byte for byte
identical to the kernel version, and it looks to me like it was
changed?
We still do not have a general solution to the need to add the header
struct in user space but not in kernel space, so you will need to
continue to get your enums from the kernel header but still have a
'copy' with the modified structs.
Does that make sense?
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jason Gunthorpe @ 2016-11-04 0:42 UTC (permalink / raw)
To: Doug Ledford
Cc: Jarod Wilson, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <581B9F91.4050407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Thu, Nov 03, 2016 at 02:35:29PM -0600, Doug Ledford wrote:
> >>> +%package -n librdmacm-utils
> >>> +Summary: Examples for the librdmacm library
> >>> +Requires: librdmacm%{?_isa} = %{version}-%{release}
> >>
> >> Why the requires? Shouldn't auto shlib dependencies take care of that?
> >
> > Probably. I think this was another legacy bit copied over from a
> > stand-alone spec file.
>
> Actually, no. When you have a -utils package that goes with a library
> package, standard procedure is to tie them directly like this. The auto
> dependency stuff will allow, say, librdmacm-1.1.17-1 and
> librdmacm-utils-1.1.16-1 to happily satisfy each other since the later
> librdmacm provides all of the sonames and apis that the -utils package
> needs. This is as designed as you want a librdamcm update to not
> trigger a required update of, say, openmpi, unless there is truly a
> change that requires it. But, for the utils that go with the library,
> even though we don't *have* to update them with the library, we want
> that to happen automatically, so the explicit requires makes that happen
> even if librdmacm-utils was excluded from the update command.
Okay, Jarod you will need to send a patch to put this back, because I
applied all the changes discussed in this email when I made the pull
request.
Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 8/8] libpvrdma: Add fix up for ABI file
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Use the fix up added by Jason to use the kernel version of pvrdma-abi.h
if it exists.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
| 1 +
buildlib/fixup-include/rdma-pvrdma-abi.h | 297 +++++++++++++++++++++++++++++++
providers/pvrdma/pvrdma-abi.h | 297 -------------------------------
providers/pvrdma/pvrdma.h | 2 +-
4 files changed, 299 insertions(+), 298 deletions(-)
create mode 100644 buildlib/fixup-include/rdma-pvrdma-abi.h
delete mode 100644 providers/pvrdma/pvrdma-abi.h
--git a/buildlib/RDMA_LinuxHeaders.cmake b/buildlib/RDMA_LinuxHeaders.cmake
index c67b0a6..4689cd1 100644
--- a/buildlib/RDMA_LinuxHeaders.cmake
+++ b/buildlib/RDMA_LinuxHeaders.cmake
@@ -83,3 +83,4 @@ rdma_check_kheader("rdma/ib_user_mad.h" "${DEFAULT_TEST}")
rdma_check_kheader("rdma/rdma_netlink.h" "int main(int argc,const char *argv[]) { return RDMA_NL_IWPM_REMOTE_INFO && RDMA_NL_IWCM; }")
rdma_check_kheader("rdma/rdma_user_cm.h" "${DEFAULT_TEST}")
rdma_check_kheader("rdma/rdma_user_rxe.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/pvrdma-abi.h" "${DEFAULT_TEST}")
diff --git a/buildlib/fixup-include/rdma-pvrdma-abi.h b/buildlib/fixup-include/rdma-pvrdma-abi.h
new file mode 100644
index 0000000..c7a38c5
--- /dev/null
+++ b/buildlib/fixup-include/rdma-pvrdma-abi.h
@@ -0,0 +1,297 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __PVRDMA_ABI_H__
+#define __PVRDMA_ABI_H__
+
+#include <infiniband/kern-abi.h>
+
+#define PVRDMA_UVERBS_ABI_VERSION 3
+#define PVRDMA_UAR_HANDLE_MASK 0x00FFFFFF /* Bottom 24 bits. */
+#define PVRDMA_UAR_QP_OFFSET 0 /* QP doorbell offset. */
+#define PVRDMA_UAR_QP_SEND BIT(30) /* Send bit. */
+#define PVRDMA_UAR_QP_RECV BIT(31) /* Recv bit. */
+#define PVRDMA_UAR_CQ_OFFSET 4 /* CQ doorbell offset. */
+#define PVRDMA_UAR_CQ_ARM_SOL BIT(29) /* Arm solicited bit. */
+#define PVRDMA_UAR_CQ_ARM BIT(30) /* Arm bit. */
+#define PVRDMA_UAR_CQ_POLL BIT(31) /* Poll bit. */
+
+enum pvrdma_wr_opcode {
+ PVRDMA_WR_RDMA_WRITE,
+ PVRDMA_WR_RDMA_WRITE_WITH_IMM,
+ PVRDMA_WR_SEND,
+ PVRDMA_WR_SEND_WITH_IMM,
+ PVRDMA_WR_RDMA_READ,
+ PVRDMA_WR_ATOMIC_CMP_AND_SWP,
+ PVRDMA_WR_ATOMIC_FETCH_AND_ADD,
+ PVRDMA_WR_LSO,
+ PVRDMA_WR_SEND_WITH_INV,
+ PVRDMA_WR_RDMA_READ_WITH_INV,
+ PVRDMA_WR_LOCAL_INV,
+ PVRDMA_WR_FAST_REG_MR,
+ PVRDMA_WR_MASKED_ATOMIC_CMP_AND_SWP,
+ PVRDMA_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+ PVRDMA_WR_BIND_MW,
+ PVRDMA_WR_REG_SIG_MR,
+};
+
+enum pvrdma_wc_status {
+ PVRDMA_WC_SUCCESS,
+ PVRDMA_WC_LOC_LEN_ERR,
+ PVRDMA_WC_LOC_QP_OP_ERR,
+ PVRDMA_WC_LOC_EEC_OP_ERR,
+ PVRDMA_WC_LOC_PROT_ERR,
+ PVRDMA_WC_WR_FLUSH_ERR,
+ PVRDMA_WC_MW_BIND_ERR,
+ PVRDMA_WC_BAD_RESP_ERR,
+ PVRDMA_WC_LOC_ACCESS_ERR,
+ PVRDMA_WC_REM_INV_REQ_ERR,
+ PVRDMA_WC_REM_ACCESS_ERR,
+ PVRDMA_WC_REM_OP_ERR,
+ PVRDMA_WC_RETRY_EXC_ERR,
+ PVRDMA_WC_RNR_RETRY_EXC_ERR,
+ PVRDMA_WC_LOC_RDD_VIOL_ERR,
+ PVRDMA_WC_REM_INV_RD_REQ_ERR,
+ PVRDMA_WC_REM_ABORT_ERR,
+ PVRDMA_WC_INV_EECN_ERR,
+ PVRDMA_WC_INV_EEC_STATE_ERR,
+ PVRDMA_WC_FATAL_ERR,
+ PVRDMA_WC_RESP_TIMEOUT_ERR,
+ PVRDMA_WC_GENERAL_ERR,
+};
+
+enum pvrdma_wc_opcode {
+ PVRDMA_WC_SEND,
+ PVRDMA_WC_RDMA_WRITE,
+ PVRDMA_WC_RDMA_READ,
+ PVRDMA_WC_COMP_SWAP,
+ PVRDMA_WC_FETCH_ADD,
+ PVRDMA_WC_BIND_MW,
+ PVRDMA_WC_LSO,
+ PVRDMA_WC_LOCAL_INV,
+ PVRDMA_WC_FAST_REG_MR,
+ PVRDMA_WC_MASKED_COMP_SWAP,
+ PVRDMA_WC_MASKED_FETCH_ADD,
+ PVRDMA_WC_RECV = 1 << 7,
+ PVRDMA_WC_RECV_RDMA_WITH_IMM,
+};
+
+enum pvrdma_wc_flags {
+ PVRDMA_WC_GRH = 1 << 0,
+ PVRDMA_WC_WITH_IMM = 1 << 1,
+ PVRDMA_WC_WITH_INVALIDATE = 1 << 2,
+ PVRDMA_WC_IP_CSUM_OK = 1 << 3,
+ PVRDMA_WC_WITH_SMAC = 1 << 4,
+ PVRDMA_WC_WITH_VLAN = 1 << 5,
+ PVRDMA_WC_FLAGS_MAX = PVRDMA_WC_WITH_VLAN,
+};
+
+struct pvrdma_alloc_ucontext_resp {
+ struct ibv_get_context_resp ibv_resp;
+ __u32 qp_tab_size;
+ __u32 reserved;
+};
+
+struct pvrdma_alloc_pd_resp {
+ struct ibv_alloc_pd_resp ibv_resp;
+ __u32 pdn;
+ __u32 reserved;
+};
+
+struct pvrdma_create_cq {
+ struct ibv_create_cq ibv_cmd;
+ __u64 buf_addr;
+ __u32 buf_size;
+ __u32 reserved;
+};
+
+struct pvrdma_create_cq_resp {
+ struct ibv_create_cq_resp ibv_resp;
+ __u32 cqn;
+ __u32 reserved;
+};
+
+struct pvrdma_resize_cq {
+ struct ibv_resize_cq ibv_cmd;
+ __u64 buf_addr;
+ __u32 buf_size;
+ __u32 reserved;
+};
+
+struct pvrdma_create_srq {
+ struct ibv_create_srq ibv_cmd;
+ __u64 buf_addr;
+};
+
+struct pvrdma_create_srq_resp {
+ struct ibv_create_srq_resp ibv_resp;
+ __u32 srqn;
+ __u32 reserved;
+};
+
+struct pvrdma_create_qp {
+ struct ibv_create_qp ibv_cmd;
+ __u64 rbuf_addr;
+ __u64 sbuf_addr;
+ __u32 rbuf_size;
+ __u32 sbuf_size;
+ __u64 qp_addr;
+};
+
+/* PVRDMA masked atomic compare and swap */
+struct pvrdma_ex_cmp_swap {
+ __u64 swap_val;
+ __u64 compare_val;
+ __u64 swap_mask;
+ __u64 compare_mask;
+};
+
+/* PVRDMA masked atomic fetch and add */
+struct pvrdma_ex_fetch_add {
+ __u64 add_val;
+ __u64 field_boundary;
+};
+
+/* PVRDMA address vector. */
+struct pvrdma_av {
+ __u32 port_pd;
+ __u32 sl_tclass_flowlabel;
+ __u8 dgid[16];
+ __u8 src_path_bits;
+ __u8 gid_index;
+ __u8 stat_rate;
+ __u8 hop_limit;
+ __u8 dmac[6];
+ __u8 reserved[6];
+};
+
+/* PVRDMA scatter/gather entry */
+struct pvrdma_sge {
+ __u64 addr;
+ __u32 length;
+ __u32 lkey;
+};
+
+/* PVRDMA receive queue work request */
+struct pvrdma_rq_wqe_hdr {
+ __u64 wr_id; /* wr id */
+ __u32 num_sge; /* size of s/g array */
+ __u32 total_len; /* reserved */
+};
+/* Use pvrdma_sge (ib_sge) for receive queue s/g array elements. */
+
+/* PVRDMA send queue work request */
+struct pvrdma_sq_wqe_hdr {
+ __u64 wr_id; /* wr id */
+ __u32 num_sge; /* size of s/g array */
+ __u32 total_len; /* reserved */
+ __u32 opcode; /* operation type */
+ __u32 send_flags; /* wr flags */
+ union {
+ __u32 imm_data;
+ __u32 invalidate_rkey;
+ } ex;
+ __u32 reserved;
+ union {
+ struct {
+ __u64 remote_addr;
+ __u32 rkey;
+ __u8 reserved[4];
+ } rdma;
+ struct {
+ __u64 remote_addr;
+ __u64 compare_add;
+ __u64 swap;
+ __u32 rkey;
+ __u32 reserved;
+ } atomic;
+ struct {
+ __u64 remote_addr;
+ __u32 log_arg_sz;
+ __u32 rkey;
+ union {
+ struct pvrdma_ex_cmp_swap cmp_swap;
+ struct pvrdma_ex_fetch_add fetch_add;
+ } wr_data;
+ } masked_atomics;
+ struct {
+ __u64 iova_start;
+ __u64 pl_pdir_dma;
+ __u32 page_shift;
+ __u32 page_list_len;
+ __u32 length;
+ __u32 access_flags;
+ __u32 rkey;
+ } fast_reg;
+ struct {
+ __u32 remote_qpn;
+ __u32 remote_qkey;
+ struct pvrdma_av av;
+ } ud;
+ } wr;
+};
+/* Use pvrdma_sge (ib_sge) for send queue s/g array elements. */
+
+/* Completion queue element. */
+struct pvrdma_cqe {
+ __u64 wr_id;
+ __u64 qp;
+ __u32 opcode;
+ __u32 status;
+ __u32 byte_len;
+ __u32 imm_data;
+ __u32 src_qp;
+ __u32 wc_flags;
+ __u32 vendor_err;
+ __u16 pkey_index;
+ __u16 slid;
+ __u8 sl;
+ __u8 dlid_path_bits;
+ __u8 port_num;
+ __u8 smac[6];
+ __u8 reserved2[7]; /* Pad to next power of 2 (64). */
+};
+
+#endif /* __PVRDMA_ABI_H__ */
diff --git a/providers/pvrdma/pvrdma-abi.h b/providers/pvrdma/pvrdma-abi.h
deleted file mode 100644
index c7a38c5..0000000
--- a/providers/pvrdma/pvrdma-abi.h
+++ /dev/null
@@ -1,297 +0,0 @@
-/*
- * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of EITHER the GNU General Public License
- * version 2 as published by the Free Software Foundation or the BSD
- * 2-Clause License. This program is distributed in the hope that it
- * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
- * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
- * See the GNU General Public License version 2 for more details at
- * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program available in the file COPYING in the main
- * directory of this source tree.
- *
- * The BSD 2-Clause License
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above
- * copyright notice, this list of conditions and the following
- * disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials
- * provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
- * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
- * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
- * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
- * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
- * OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef __PVRDMA_ABI_H__
-#define __PVRDMA_ABI_H__
-
-#include <infiniband/kern-abi.h>
-
-#define PVRDMA_UVERBS_ABI_VERSION 3
-#define PVRDMA_UAR_HANDLE_MASK 0x00FFFFFF /* Bottom 24 bits. */
-#define PVRDMA_UAR_QP_OFFSET 0 /* QP doorbell offset. */
-#define PVRDMA_UAR_QP_SEND BIT(30) /* Send bit. */
-#define PVRDMA_UAR_QP_RECV BIT(31) /* Recv bit. */
-#define PVRDMA_UAR_CQ_OFFSET 4 /* CQ doorbell offset. */
-#define PVRDMA_UAR_CQ_ARM_SOL BIT(29) /* Arm solicited bit. */
-#define PVRDMA_UAR_CQ_ARM BIT(30) /* Arm bit. */
-#define PVRDMA_UAR_CQ_POLL BIT(31) /* Poll bit. */
-
-enum pvrdma_wr_opcode {
- PVRDMA_WR_RDMA_WRITE,
- PVRDMA_WR_RDMA_WRITE_WITH_IMM,
- PVRDMA_WR_SEND,
- PVRDMA_WR_SEND_WITH_IMM,
- PVRDMA_WR_RDMA_READ,
- PVRDMA_WR_ATOMIC_CMP_AND_SWP,
- PVRDMA_WR_ATOMIC_FETCH_AND_ADD,
- PVRDMA_WR_LSO,
- PVRDMA_WR_SEND_WITH_INV,
- PVRDMA_WR_RDMA_READ_WITH_INV,
- PVRDMA_WR_LOCAL_INV,
- PVRDMA_WR_FAST_REG_MR,
- PVRDMA_WR_MASKED_ATOMIC_CMP_AND_SWP,
- PVRDMA_WR_MASKED_ATOMIC_FETCH_AND_ADD,
- PVRDMA_WR_BIND_MW,
- PVRDMA_WR_REG_SIG_MR,
-};
-
-enum pvrdma_wc_status {
- PVRDMA_WC_SUCCESS,
- PVRDMA_WC_LOC_LEN_ERR,
- PVRDMA_WC_LOC_QP_OP_ERR,
- PVRDMA_WC_LOC_EEC_OP_ERR,
- PVRDMA_WC_LOC_PROT_ERR,
- PVRDMA_WC_WR_FLUSH_ERR,
- PVRDMA_WC_MW_BIND_ERR,
- PVRDMA_WC_BAD_RESP_ERR,
- PVRDMA_WC_LOC_ACCESS_ERR,
- PVRDMA_WC_REM_INV_REQ_ERR,
- PVRDMA_WC_REM_ACCESS_ERR,
- PVRDMA_WC_REM_OP_ERR,
- PVRDMA_WC_RETRY_EXC_ERR,
- PVRDMA_WC_RNR_RETRY_EXC_ERR,
- PVRDMA_WC_LOC_RDD_VIOL_ERR,
- PVRDMA_WC_REM_INV_RD_REQ_ERR,
- PVRDMA_WC_REM_ABORT_ERR,
- PVRDMA_WC_INV_EECN_ERR,
- PVRDMA_WC_INV_EEC_STATE_ERR,
- PVRDMA_WC_FATAL_ERR,
- PVRDMA_WC_RESP_TIMEOUT_ERR,
- PVRDMA_WC_GENERAL_ERR,
-};
-
-enum pvrdma_wc_opcode {
- PVRDMA_WC_SEND,
- PVRDMA_WC_RDMA_WRITE,
- PVRDMA_WC_RDMA_READ,
- PVRDMA_WC_COMP_SWAP,
- PVRDMA_WC_FETCH_ADD,
- PVRDMA_WC_BIND_MW,
- PVRDMA_WC_LSO,
- PVRDMA_WC_LOCAL_INV,
- PVRDMA_WC_FAST_REG_MR,
- PVRDMA_WC_MASKED_COMP_SWAP,
- PVRDMA_WC_MASKED_FETCH_ADD,
- PVRDMA_WC_RECV = 1 << 7,
- PVRDMA_WC_RECV_RDMA_WITH_IMM,
-};
-
-enum pvrdma_wc_flags {
- PVRDMA_WC_GRH = 1 << 0,
- PVRDMA_WC_WITH_IMM = 1 << 1,
- PVRDMA_WC_WITH_INVALIDATE = 1 << 2,
- PVRDMA_WC_IP_CSUM_OK = 1 << 3,
- PVRDMA_WC_WITH_SMAC = 1 << 4,
- PVRDMA_WC_WITH_VLAN = 1 << 5,
- PVRDMA_WC_FLAGS_MAX = PVRDMA_WC_WITH_VLAN,
-};
-
-struct pvrdma_alloc_ucontext_resp {
- struct ibv_get_context_resp ibv_resp;
- __u32 qp_tab_size;
- __u32 reserved;
-};
-
-struct pvrdma_alloc_pd_resp {
- struct ibv_alloc_pd_resp ibv_resp;
- __u32 pdn;
- __u32 reserved;
-};
-
-struct pvrdma_create_cq {
- struct ibv_create_cq ibv_cmd;
- __u64 buf_addr;
- __u32 buf_size;
- __u32 reserved;
-};
-
-struct pvrdma_create_cq_resp {
- struct ibv_create_cq_resp ibv_resp;
- __u32 cqn;
- __u32 reserved;
-};
-
-struct pvrdma_resize_cq {
- struct ibv_resize_cq ibv_cmd;
- __u64 buf_addr;
- __u32 buf_size;
- __u32 reserved;
-};
-
-struct pvrdma_create_srq {
- struct ibv_create_srq ibv_cmd;
- __u64 buf_addr;
-};
-
-struct pvrdma_create_srq_resp {
- struct ibv_create_srq_resp ibv_resp;
- __u32 srqn;
- __u32 reserved;
-};
-
-struct pvrdma_create_qp {
- struct ibv_create_qp ibv_cmd;
- __u64 rbuf_addr;
- __u64 sbuf_addr;
- __u32 rbuf_size;
- __u32 sbuf_size;
- __u64 qp_addr;
-};
-
-/* PVRDMA masked atomic compare and swap */
-struct pvrdma_ex_cmp_swap {
- __u64 swap_val;
- __u64 compare_val;
- __u64 swap_mask;
- __u64 compare_mask;
-};
-
-/* PVRDMA masked atomic fetch and add */
-struct pvrdma_ex_fetch_add {
- __u64 add_val;
- __u64 field_boundary;
-};
-
-/* PVRDMA address vector. */
-struct pvrdma_av {
- __u32 port_pd;
- __u32 sl_tclass_flowlabel;
- __u8 dgid[16];
- __u8 src_path_bits;
- __u8 gid_index;
- __u8 stat_rate;
- __u8 hop_limit;
- __u8 dmac[6];
- __u8 reserved[6];
-};
-
-/* PVRDMA scatter/gather entry */
-struct pvrdma_sge {
- __u64 addr;
- __u32 length;
- __u32 lkey;
-};
-
-/* PVRDMA receive queue work request */
-struct pvrdma_rq_wqe_hdr {
- __u64 wr_id; /* wr id */
- __u32 num_sge; /* size of s/g array */
- __u32 total_len; /* reserved */
-};
-/* Use pvrdma_sge (ib_sge) for receive queue s/g array elements. */
-
-/* PVRDMA send queue work request */
-struct pvrdma_sq_wqe_hdr {
- __u64 wr_id; /* wr id */
- __u32 num_sge; /* size of s/g array */
- __u32 total_len; /* reserved */
- __u32 opcode; /* operation type */
- __u32 send_flags; /* wr flags */
- union {
- __u32 imm_data;
- __u32 invalidate_rkey;
- } ex;
- __u32 reserved;
- union {
- struct {
- __u64 remote_addr;
- __u32 rkey;
- __u8 reserved[4];
- } rdma;
- struct {
- __u64 remote_addr;
- __u64 compare_add;
- __u64 swap;
- __u32 rkey;
- __u32 reserved;
- } atomic;
- struct {
- __u64 remote_addr;
- __u32 log_arg_sz;
- __u32 rkey;
- union {
- struct pvrdma_ex_cmp_swap cmp_swap;
- struct pvrdma_ex_fetch_add fetch_add;
- } wr_data;
- } masked_atomics;
- struct {
- __u64 iova_start;
- __u64 pl_pdir_dma;
- __u32 page_shift;
- __u32 page_list_len;
- __u32 length;
- __u32 access_flags;
- __u32 rkey;
- } fast_reg;
- struct {
- __u32 remote_qpn;
- __u32 remote_qkey;
- struct pvrdma_av av;
- } ud;
- } wr;
-};
-/* Use pvrdma_sge (ib_sge) for send queue s/g array elements. */
-
-/* Completion queue element. */
-struct pvrdma_cqe {
- __u64 wr_id;
- __u64 qp;
- __u32 opcode;
- __u32 status;
- __u32 byte_len;
- __u32 imm_data;
- __u32 src_qp;
- __u32 wc_flags;
- __u32 vendor_err;
- __u16 pkey_index;
- __u16 slid;
- __u8 sl;
- __u8 dlid_path_bits;
- __u8 port_num;
- __u8 smac[6];
- __u8 reserved2[7]; /* Pad to next power of 2 (64). */
-};
-
-#endif /* __PVRDMA_ABI_H__ */
diff --git a/providers/pvrdma/pvrdma.h b/providers/pvrdma/pvrdma.h
index d3df07d..703cb5f 100644
--- a/providers/pvrdma/pvrdma.h
+++ b/providers/pvrdma/pvrdma.h
@@ -55,10 +55,10 @@
#include <netinet/in.h>
#include <sys/mman.h>
#include <infiniband/driver.h>
+#include <rdma/pvrdma-abi.h>
#define BIT(nr) (1UL << (nr))
-#include "pvrdma-abi.h"
#include "pvrdma_ring.h"
#ifndef rmb
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 7/8] libpvrdma: Add to consolidated rdma-core
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Update the build scripts and infrastructure for the pvrdma user library.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
CMakeLists.txt | 1 +
MAINTAINERS | 6 ++++++
README.md | 1 +
providers/pvrdma/CMakeLists.txt | 6 ++++++
4 files changed, 14 insertions(+)
create mode 100644 providers/pvrdma/CMakeLists.txt
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b3b3ff1..2010265 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -335,6 +335,7 @@ add_subdirectory(providers/mlx5)
add_subdirectory(providers/mthca)
add_subdirectory(providers/nes)
add_subdirectory(providers/ocrdma)
+add_subdirectory(providers/pvrdma)
add_subdirectory(providers/qedr)
add_subdirectory(providers/rxe)
add_subdirectory(providers/rxe/man)
diff --git a/MAINTAINERS b/MAINTAINERS
index d83de10..69ab1f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -139,6 +139,12 @@ M: Devesh Sharma <Devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
S: Supported
F: providers/ocrdma/
+PVRDMA USERSPACE PROVIDER (for pvrdma.ko)
+M: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
+L: pv-drivers-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org
+S: Supported
+F: providers/pvrdma/
+
QEDR USERSPACE PROVIDER (for qedr.ko)
M: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
M: Ariel Elior <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
diff --git a/README.md b/README.md
index 3a13042..fed8803 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@ is included:
- ib_mthca.ko
- iw_nes.ko
- ocrdma.ko
+ - pvrdma.ko
- qedr.ko
- rdma_rxe.ko
diff --git a/providers/pvrdma/CMakeLists.txt b/providers/pvrdma/CMakeLists.txt
new file mode 100644
index 0000000..8ba9a45
--- /dev/null
+++ b/providers/pvrdma/CMakeLists.txt
@@ -0,0 +1,6 @@
+rdma_provider(pvrdma
+ cq.c
+ pvrdma_main.c
+ qp.c
+ verbs.c
+)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 6/8] libpvrdma: Add main library file
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Registers the pvrdma library with libibverbs and allocates the user context.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
providers/pvrdma/pvrdma_main.c | 214 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 214 insertions(+)
create mode 100644 providers/pvrdma/pvrdma_main.c
diff --git a/providers/pvrdma/pvrdma_main.c b/providers/pvrdma/pvrdma_main.c
new file mode 100644
index 0000000..909cf1e
--- /dev/null
+++ b/providers/pvrdma/pvrdma_main.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "pvrdma.h"
+
+static struct ibv_context_ops pvrdma_ctx_ops = {
+ .query_device = pvrdma_query_device,
+ .query_port = pvrdma_query_port,
+ .alloc_pd = pvrdma_alloc_pd,
+ .dealloc_pd = pvrdma_free_pd,
+
+ .reg_mr = pvrdma_reg_mr,
+ .dereg_mr = pvrdma_dereg_mr,
+ .create_cq = pvrdma_create_cq,
+ .poll_cq = pvrdma_poll_cq,
+ .req_notify_cq = pvrdma_req_notify_cq,
+ .destroy_cq = pvrdma_destroy_cq,
+
+ .create_qp = pvrdma_create_qp,
+ .query_qp = pvrdma_query_qp,
+ .modify_qp = pvrdma_modify_qp,
+ .destroy_qp = pvrdma_destroy_qp,
+
+ .post_send = pvrdma_post_send,
+ .post_recv = pvrdma_post_recv,
+ .create_ah = pvrdma_create_ah,
+ .destroy_ah = pvrdma_destroy_ah,
+};
+
+int pvrdma_alloc_buf(struct pvrdma_buf *buf, size_t size, int page_size)
+{
+ int ret;
+
+ buf->length = align(size, page_size);
+ buf->buf = mmap(NULL, buf->length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (buf->buf == MAP_FAILED)
+ return errno;
+
+ ret = ibv_dontfork_range(buf->buf, size);
+ if (ret)
+ munmap(buf->buf, buf->length);
+
+ return ret;
+}
+
+void pvrdma_free_buf(struct pvrdma_buf *buf)
+{
+ ibv_dofork_range(buf->buf, buf->length);
+ munmap(buf->buf, buf->length);
+}
+
+static int pvrdma_init_context_shared(struct pvrdma_context *context,
+ struct ibv_device *ibdev,
+ int cmd_fd)
+{
+ struct ibv_get_context cmd;
+ struct pvrdma_alloc_ucontext_resp resp;
+
+ context->ibv_ctx.cmd_fd = cmd_fd;
+ if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp)))
+ return errno;
+
+ context->qp_tbl = calloc(resp.qp_tab_size & 0xFFFF,
+ sizeof(struct pvrdma_qp *));
+ if (!context->qp_tbl)
+ return -ENOMEM;
+
+ context->uar = mmap(NULL, to_vdev(ibdev)->page_size, PROT_WRITE,
+ MAP_SHARED, cmd_fd, 0);
+ if (context->uar == MAP_FAILED) {
+ free(context->qp_tbl);
+ return errno;
+ }
+
+ pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+ context->ibv_ctx.ops = pvrdma_ctx_ops;
+
+ return 0;
+}
+
+static void pvrdma_free_context_shared(struct pvrdma_context *context,
+ struct pvrdma_device *dev)
+{
+ munmap(context->uar, dev->page_size);
+ free(context->qp_tbl);
+}
+
+static struct ibv_context *pvrdma_alloc_context(struct ibv_device *ibdev,
+ int cmd_fd)
+{
+ struct pvrdma_context *context;
+
+ context = malloc(sizeof(*context));
+ if (!context)
+ return NULL;
+
+ memset(context, 0, sizeof(*context));
+
+ if (pvrdma_init_context_shared(context, ibdev, cmd_fd)) {
+ free(context);
+ return NULL;
+ }
+
+ return &context->ibv_ctx;
+}
+
+static void pvrdma_free_context(struct ibv_context *ibctx)
+{
+ struct pvrdma_context *context = to_vctx(ibctx);
+
+ pvrdma_free_context_shared(context, to_vdev(ibctx->device));
+ free(context);
+}
+
+static struct ibv_device_ops pvrdma_dev_ops = {
+ .alloc_context = pvrdma_alloc_context,
+ .free_context = pvrdma_free_context
+};
+
+static struct pvrdma_device *pvrdma_driver_init_shared(
+ const char *uverbs_sys_path,
+ int abi_version)
+{
+ struct pvrdma_device *dev;
+ char name[16];
+
+ /* We support only a single ABI version for now. */
+ if (abi_version != PVRDMA_UVERBS_ABI_VERSION) {
+ fprintf(stderr, PFX "ABI version %d of %s is not "
+ "supported (supported %d)\n",
+ abi_version, uverbs_sys_path,
+ PVRDMA_UVERBS_ABI_VERSION);
+ return NULL;
+ }
+
+ if (ibv_read_sysfs_file(uverbs_sys_path,
+ "ibdev", name, sizeof(name)) < 0) {
+ fprintf(stderr, PFX "not ib device\n");
+ return NULL;
+ }
+
+ dev = malloc(sizeof(*dev));
+ if (!dev) {
+ fprintf(stderr, PFX "couldn't allocate device for %s\n",
+ uverbs_sys_path);
+ return NULL;
+ }
+
+ dev->abi_version = abi_version;
+ dev->page_size = sysconf(_SC_PAGESIZE);
+ dev->ibv_dev.ops = pvrdma_dev_ops;
+
+ return dev;
+}
+
+static struct ibv_device *pvrdma_driver_init(const char *uverbs_sys_path,
+ int abi_version)
+{
+ struct pvrdma_device *dev = pvrdma_driver_init_shared(uverbs_sys_path,
+ abi_version);
+ if (!dev)
+ return NULL;
+
+ return &dev->ibv_dev;
+}
+
+static __attribute__((constructor)) void pvrdma_register_driver(void)
+{
+ ibv_register_driver("pvrdma", pvrdma_driver_init);
+}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 5/8] libpvrdma: Add misc verbs functions
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
This includes other verbs functions that dont necessarily fit anywhere else.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
providers/pvrdma/verbs.c | 234 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 234 insertions(+)
create mode 100644 providers/pvrdma/verbs.c
diff --git a/providers/pvrdma/verbs.c b/providers/pvrdma/verbs.c
new file mode 100644
index 0000000..1646708
--- /dev/null
+++ b/providers/pvrdma/verbs.c
@@ -0,0 +1,234 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "pvrdma.h"
+
+int pvrdma_query_device(struct ibv_context *context,
+ struct ibv_device_attr *attr)
+{
+ struct ibv_query_device cmd;
+ uint64_t raw_fw_ver;
+ unsigned major, minor, sub_minor;
+ int ret;
+
+ ret = ibv_cmd_query_device(context, attr, &raw_fw_ver,
+ &cmd, sizeof(cmd));
+ if (ret)
+ return ret;
+
+ major = (raw_fw_ver >> 32) & 0xffff;
+ minor = (raw_fw_ver >> 16) & 0xffff;
+ sub_minor = raw_fw_ver & 0xffff;
+
+ snprintf(attr->fw_ver, sizeof(attr->fw_ver),
+ "%d.%d.%03d", major, minor, sub_minor);
+
+ return 0;
+}
+
+int pvrdma_query_port(struct ibv_context *context, uint8_t port,
+ struct ibv_port_attr *attr)
+{
+ struct ibv_query_port cmd;
+
+ return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
+}
+
+struct ibv_pd *pvrdma_alloc_pd(struct ibv_context *context)
+{
+ struct ibv_alloc_pd cmd;
+ struct pvrdma_alloc_pd_resp resp;
+ struct pvrdma_pd *pd;
+
+ pd = malloc(sizeof(*pd));
+ if (!pd)
+ return NULL;
+
+ if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp))) {
+ free(pd);
+ return NULL;
+ }
+
+ pd->pdn = resp.pdn;
+
+ return &pd->ibv_pd;
+}
+
+int pvrdma_free_pd(struct ibv_pd *pd)
+{
+ int ret;
+
+ ret = ibv_cmd_dealloc_pd(pd);
+ if (ret)
+ return ret;
+
+ free(to_vpd(pd));
+
+ return 0;
+}
+
+struct ibv_mr *pvrdma_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+ int access)
+{
+ struct ibv_mr *mr;
+ struct ibv_reg_mr cmd;
+ struct ibv_reg_mr_resp resp;
+ int ret;
+
+ mr = malloc(sizeof(*mr));
+ if (!mr)
+ return NULL;
+
+ ret = ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr,
+ access, mr, &cmd, sizeof(cmd),
+ &resp, sizeof(resp));
+ if (ret) {
+ free(mr);
+ return NULL;
+ }
+
+ return mr;
+}
+
+int pvrdma_dereg_mr(struct ibv_mr *mr)
+{
+ int ret;
+
+ ret = ibv_cmd_dereg_mr(mr);
+ if (ret)
+ return ret;
+
+ free(mr);
+
+ return 0;
+}
+
+static int is_multicast_gid(const union ibv_gid *gid)
+{
+ return gid->raw[0] == 0xff;
+}
+
+static int is_link_local_gid(const union ibv_gid *gid)
+{
+ uint32_t *hi = (uint32_t *)(gid->raw);
+ uint32_t *lo = (uint32_t *)(gid->raw + 4);
+ if (hi[0] == htonl(0xfe800000) && lo[0] == 0)
+ return 1;
+
+ return 0;
+}
+
+static int is_ipv6_addr_v4mapped(const struct in6_addr *a)
+{
+ return ((a->s6_addr32[0] | a->s6_addr32[1]) |
+ (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL ||
+ /* IPv4 encoded multicast addresses */
+ (a->s6_addr32[0] == htonl(0xff0e0000) &&
+ ((a->s6_addr32[1] |
+ (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL));
+}
+
+static void set_mac_from_gid(const union ibv_gid *gid,
+ __u8 mac[6])
+{
+ if (is_link_local_gid(gid)) {
+ /*
+ * The MAC is embedded in GID[8-10,13-15] with the
+ * 7th most significant bit inverted.
+ */
+ memcpy(mac, gid->raw + 8, 3);
+ memcpy(mac + 3, gid->raw + 13, 3);
+ mac[0] ^= 2;
+ }
+}
+
+struct ibv_ah *pvrdma_create_ah(struct ibv_pd *pd,
+ struct ibv_ah_attr *attr)
+{
+ struct pvrdma_ah *ah;
+ struct pvrdma_av *av;
+ struct ibv_port_attr port_attr;
+
+ if (!attr->is_global)
+ return NULL;
+
+ if (ibv_query_port(pd->context, attr->port_num, &port_attr))
+ return NULL;
+
+ if (port_attr.link_layer == IBV_LINK_LAYER_UNSPECIFIED ||
+ port_attr.link_layer == IBV_LINK_LAYER_INFINIBAND)
+ return NULL;
+
+ if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET &&
+ (!is_link_local_gid(&attr->grh.dgid) &&
+ !is_multicast_gid(&attr->grh.dgid) &&
+ !is_ipv6_addr_v4mapped((struct in6_addr *)attr->grh.dgid.raw)))
+ return NULL;
+
+ ah = calloc(1, sizeof(*ah));
+ if (!ah)
+ return NULL;
+
+ av = &ah->av;
+ av->port_pd = to_vpd(pd)->pdn | (attr->port_num << 24);
+ av->src_path_bits = attr->src_path_bits;
+ av->src_path_bits |= 0x80;
+ av->gid_index = attr->grh.sgid_index;
+ av->hop_limit = attr->grh.hop_limit;
+ av->sl_tclass_flowlabel = (attr->grh.traffic_class << 20) |
+ attr->grh.flow_label;
+ memcpy(av->dgid, attr->grh.dgid.raw, 16);
+ set_mac_from_gid(&attr->grh.dgid, av->dmac);
+
+ return &ah->ibv_ah;
+}
+
+int pvrdma_destroy_ah(struct ibv_ah *ah)
+{
+ free(to_vah(ah));
+
+ return 0;
+}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 4/8] libpvrdma: Add queue pair functions
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Added functions to create, destroy, post on queue pairs.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
providers/pvrdma/qp.c | 505 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 505 insertions(+)
create mode 100644 providers/pvrdma/qp.c
diff --git a/providers/pvrdma/qp.c b/providers/pvrdma/qp.c
new file mode 100644
index 0000000..46a0e32
--- /dev/null
+++ b/providers/pvrdma/qp.c
@@ -0,0 +1,505 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <infiniband/arch.h>
+
+#include "pvrdma.h"
+
+int pvrdma_alloc_qp_buf(struct pvrdma_device *dev, struct ibv_qp_cap *cap,
+ enum ibv_qp_type type, struct pvrdma_qp *qp)
+{
+ qp->sq.wrid = malloc(qp->sq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->sq.wrid)
+ return -1;
+
+ qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->rq.wrid) {
+ free(qp->sq.wrid);
+ return -1;
+ }
+
+ /* Align page size for [rq][sq] */
+ qp->rbuf.length = align(qp->rq.offset +
+ qp->rq.wqe_cnt * qp->rq.wqe_size,
+ dev->page_size);
+ qp->sbuf.length = align(qp->sq.offset +
+ qp->sq.wqe_cnt * qp->sq.wqe_size,
+ dev->page_size);
+ qp->buf_size = qp->rbuf.length + qp->sbuf.length;
+
+ if (pvrdma_alloc_buf(&qp->rbuf, qp->rbuf.length, dev->page_size)) {
+ free(qp->sq.wrid);
+ free(qp->rq.wrid);
+ return -1;
+ }
+
+ if (pvrdma_alloc_buf(&qp->sbuf, qp->sbuf.length, dev->page_size)) {
+ free(qp->sq.wrid);
+ free(qp->rq.wrid);
+ pvrdma_free_buf(&qp->rbuf);
+ return -1;
+ }
+
+ memset(qp->rbuf.buf, 0, qp->rbuf.length);
+ memset(qp->sbuf.buf, 0, qp->sbuf.length);
+
+ return 0;
+}
+
+static void pvrdma_init_qp_queue(struct pvrdma_qp *qp)
+{
+ atomic_set(&(qp->sq.ring_state->cons_head), 0);
+ atomic_set(&(qp->sq.ring_state->prod_tail), 0);
+ atomic_set(&(qp->rq.ring_state->cons_head), 0);
+ atomic_set(&(qp->rq.ring_state->prod_tail), 0);
+}
+
+struct ibv_qp *pvrdma_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr)
+{
+ struct pvrdma_device *dev = to_vdev(pd->context->device);
+ struct pvrdma_create_qp cmd;
+ struct ibv_create_qp_resp resp;
+ struct pvrdma_qp *qp;
+ int ret;
+
+ attr->cap.max_recv_sge =
+ align_next_power2(max(1U, attr->cap.max_recv_sge));
+ attr->cap.max_recv_wr =
+ align_next_power2(max(1U, attr->cap.max_recv_wr));
+ attr->cap.max_send_sge =
+ align_next_power2(max(1U, attr->cap.max_send_sge));
+ attr->cap.max_send_wr =
+ align_next_power2(max(1U, attr->cap.max_send_wr));
+
+ qp = calloc(1, sizeof(*qp));
+ if (!qp)
+ return NULL;
+
+ qp->rq.max_gs = attr->cap.max_recv_sge;
+ qp->rq.wqe_cnt = attr->cap.max_recv_wr;
+ qp->rq.offset = 0;
+ qp->rq.wqe_size = align_next_power2(sizeof(struct pvrdma_rq_wqe_hdr) +
+ sizeof(struct ibv_sge) *
+ qp->rq.max_gs);
+
+ qp->sq.max_gs = attr->cap.max_send_sge;
+ qp->sq.wqe_cnt = attr->cap.max_send_wr;
+ /* Extra page for shared ring state */
+ qp->sq.offset = dev->page_size;
+ qp->sq.wqe_size = align_next_power2(sizeof(struct pvrdma_sq_wqe_hdr) +
+ sizeof(struct ibv_sge) *
+ qp->sq.max_gs);
+
+ /* Reset attr.cap, no srq for now */
+ if (attr->srq) {
+ attr->cap.max_recv_wr = 0;
+ qp->rq.wqe_cnt = 0;
+ }
+
+ /* Allocate [rq][sq] memory */
+ if (pvrdma_alloc_qp_buf(dev, &attr->cap, attr->qp_type, qp))
+ goto err;
+
+ qp->sq.ring_state = qp->sbuf.buf;
+ qp->rq.ring_state = (struct pvrdma_ring *)&qp->sq.ring_state[1];
+ pvrdma_init_qp_queue(qp);
+
+ if (pthread_spin_init(&qp->sq.lock, PTHREAD_PROCESS_PRIVATE) ||
+ pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE))
+ goto err_free;
+
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.rbuf_addr = (uintptr_t)qp->rbuf.buf;
+ cmd.rbuf_size = qp->rbuf.length;
+ cmd.sbuf_addr = (uintptr_t)qp->sbuf.buf;
+ cmd.sbuf_size = qp->sbuf.length;
+ cmd.qp_addr = (uintptr_t) qp;
+
+ ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr,
+ &cmd.ibv_cmd, sizeof(cmd),
+ &resp, sizeof(resp));
+
+ if (ret)
+ goto err_free;
+
+ to_vctx(pd->context)->qp_tbl[qp->ibv_qp.qp_num & 0xFFFF] = qp;
+
+ /* If set, each WR submitted to the SQ generate a completion entry */
+ if (attr->sq_sig_all)
+ qp->sq_signal_bits = htonl(PVRDMA_WQE_CTRL_CQ_UPDATE);
+ else
+ qp->sq_signal_bits = 0;
+
+ return &qp->ibv_qp;
+
+err_free:
+ if (qp->sq.wqe_cnt)
+ free(qp->sq.wrid);
+ if (qp->rq.wqe_cnt)
+ free(qp->rq.wrid);
+ pvrdma_free_buf(&qp->rbuf);
+ pvrdma_free_buf(&qp->sbuf);
+err:
+ free(qp);
+
+ return NULL;
+}
+
+int pvrdma_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask,
+ struct ibv_qp_init_attr *init_attr)
+{
+ struct ibv_query_qp cmd;
+ struct pvrdma_qp *qp = to_vqp(ibqp);
+ int ret;
+
+ ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr,
+ &cmd, sizeof(cmd));
+ if (ret)
+ return ret;
+
+ /* Passing back */
+ init_attr->cap.max_send_wr = qp->sq.wqe_cnt;
+ init_attr->cap.max_send_sge = qp->sq.max_gs;
+ init_attr->cap.max_inline_data = qp->max_inline_data;
+
+ attr->cap = init_attr->cap;
+
+ return 0;
+}
+
+int pvrdma_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask)
+{
+ struct ibv_modify_qp cmd;
+ int ret;
+
+ /* Sanity check */
+ if (!attr_mask)
+ return 0;
+
+ ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+ if (!ret &&
+ (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RESET) {
+ pvrdma_cq_clean(to_vcq(qp->recv_cq), qp->qp_num);
+ if (qp->send_cq != qp->recv_cq)
+ pvrdma_cq_clean(to_vcq(qp->send_cq), qp->qp_num);
+ pvrdma_init_qp_queue(to_vqp(qp));
+ }
+
+ return ret;
+}
+
+static void pvrdma_lock_cqs(struct ibv_qp *qp)
+{
+ struct pvrdma_cq *send_cq = to_vcq(qp->send_cq);
+ struct pvrdma_cq *recv_cq = to_vcq(qp->recv_cq);
+
+ if (send_cq == recv_cq)
+ pthread_spin_lock(&send_cq->lock);
+ else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_lock(&send_cq->lock);
+ pthread_spin_lock(&recv_cq->lock);
+ } else {
+ pthread_spin_lock(&recv_cq->lock);
+ pthread_spin_lock(&send_cq->lock);
+ }
+}
+
+static void pvrdma_unlock_cqs(struct ibv_qp *qp)
+{
+ struct pvrdma_cq *send_cq = to_vcq(qp->send_cq);
+ struct pvrdma_cq *recv_cq = to_vcq(qp->recv_cq);
+
+ if (send_cq == recv_cq)
+ pthread_spin_unlock(&send_cq->lock);
+ else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_unlock(&recv_cq->lock);
+ pthread_spin_unlock(&send_cq->lock);
+ } else {
+ pthread_spin_unlock(&send_cq->lock);
+ pthread_spin_unlock(&recv_cq->lock);
+ }
+}
+
+int pvrdma_destroy_qp(struct ibv_qp *ibqp)
+{
+ struct pvrdma_context *ctx = to_vctx(ibqp->context);
+ struct pvrdma_qp *qp = to_vqp(ibqp);
+ int ret;
+
+ ret = ibv_cmd_destroy_qp(ibqp);
+ if (ret) {
+ return ret;
+ }
+
+ pvrdma_lock_cqs(ibqp);
+ /* Dump cqs */
+ __pvrdma_cq_clean(to_vcq(ibqp->recv_cq), ibqp->qp_num);
+
+ if (ibqp->send_cq != ibqp->recv_cq)
+ __pvrdma_cq_clean(to_vcq(ibqp->send_cq), ibqp->qp_num);
+ pvrdma_unlock_cqs(ibqp);
+
+ free(qp->sq.wrid);
+ free(qp->rq.wrid);
+ pvrdma_free_buf(&qp->rbuf);
+ pvrdma_free_buf(&qp->sbuf);
+ ctx->qp_tbl[ibqp->qp_num & 0xFFFF] = NULL;
+ free(qp);
+
+ return 0;
+}
+
+static void *get_rq_wqe(struct pvrdma_qp *qp, int n)
+{
+ return qp->rbuf.buf + qp->rq.offset + (n * qp->rq.wqe_size);
+}
+
+static void *get_sq_wqe(struct pvrdma_qp *qp, int n)
+{
+ return qp->sbuf.buf + qp->sq.offset + (n * qp->sq.wqe_size);
+}
+
+int pvrdma_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr)
+{
+ struct pvrdma_context *ctx = to_vctx(ibqp->context);
+ struct pvrdma_qp *qp = to_vqp(ibqp);
+ int ind;
+ int nreq = 0;
+ struct pvrdma_sq_wqe_hdr *wqe_hdr;
+ struct ibv_sge *sge;
+ int ret = 0;
+ int i;
+
+ /*
+ * In states lower than RTS, we can fail immediately. In other states,
+ * just post and let the device figure it out.
+ */
+ if (ibqp->state < IBV_QPS_RTS) {
+ *bad_wr = wr;
+ return EINVAL;
+ }
+
+ pthread_spin_lock(&qp->sq.lock);
+ ind = pvrdma_idx(&(qp->sq.ring_state->prod_tail), qp->sq.wqe_cnt);
+ if (ind < 0) {
+ pthread_spin_unlock(&qp->sq.lock);
+ ret = EINVAL;
+ goto out;
+ }
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ unsigned int tail;
+
+ if (pvrdma_idx_ring_has_space(qp->sq.ring_state,
+ qp->sq.wqe_cnt, &tail) <= 0) {
+ ret = ENOMEM;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge > qp->sq.max_gs) {
+ ret = EINVAL;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ wqe_hdr = (struct pvrdma_sq_wqe_hdr *)get_sq_wqe(qp, ind);
+ wqe_hdr->wr_id = wr->wr_id;
+ wqe_hdr->num_sge = wr->num_sge;
+ wqe_hdr->opcode = ibv_wr_opcode_to_pvrdma(wr->opcode);
+ wqe_hdr->send_flags = ibv_send_flags_to_pvrdma(wr->send_flags);
+ if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
+ wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
+ wqe_hdr->ex.imm_data = wr->imm_data;
+
+ switch (ibqp->qp_type) {
+ case IBV_QPT_UD:
+ wqe_hdr->wr.ud.remote_qpn = wr->wr.ud.remote_qpn;
+ wqe_hdr->wr.ud.remote_qkey = wr->wr.ud.remote_qkey;
+ wqe_hdr->wr.ud.av = to_vah(wr->wr.ud.ah)->av;
+ break;
+ case IBV_QPT_RC:
+ switch (wr->opcode) {
+ case IBV_WR_RDMA_READ:
+ case IBV_WR_RDMA_WRITE:
+ case IBV_WR_RDMA_WRITE_WITH_IMM:
+ wqe_hdr->wr.rdma.remote_addr =
+ wr->wr.rdma.remote_addr;
+ wqe_hdr->wr.rdma.rkey = wr->wr.rdma.rkey;
+ break;
+ case IBV_WR_ATOMIC_CMP_AND_SWP:
+ case IBV_WR_ATOMIC_FETCH_AND_ADD:
+ wqe_hdr->wr.atomic.remote_addr = wr->wr.atomic.remote_addr;
+ wqe_hdr->wr.atomic.rkey = wr->wr.atomic.rkey;
+ wqe_hdr->wr.atomic.compare_add = wr->wr.atomic.compare_add;
+ if (wr->opcode == IBV_WR_ATOMIC_CMP_AND_SWP)
+ wqe_hdr->wr.atomic.swap = wr->wr.atomic.swap;
+ break;
+ default:
+ /* No extra segments required for sends */
+ break;
+ }
+ break;
+ default:
+ fprintf(stderr, PFX "invalid post send opcode\n");
+ ret = EINVAL;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ /* Write each segment */
+ sge = (struct ibv_sge *)&wqe_hdr[1];
+ for (i = 0; i < wr->num_sge; i++) {
+ sge->addr = wr->sg_list[i].addr;
+ sge->length = wr->sg_list[i].length;
+ sge->lkey = wr->sg_list[i].lkey;
+ sge++;
+ }
+
+ pvrdma_idx_ring_inc(&(qp->sq.ring_state->prod_tail),
+ qp->sq.wqe_cnt);
+
+ wmb();
+
+ qp->sq.wrid[ind] = wr->wr_id;
+ ++ind;
+ if (ind >= qp->sq.wqe_cnt)
+ ind = 0;
+ }
+
+out:
+ if (nreq)
+ pvrdma_write_uar_qp(ctx->uar,
+ PVRDMA_UAR_QP_SEND | ibqp->qp_num);
+
+ wmb();
+ pthread_spin_unlock(&qp->sq.lock);
+
+ return ret;
+}
+
+int pvrdma_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr)
+{
+ struct pvrdma_context *ctx = to_vctx(ibqp->context);
+ struct pvrdma_qp *qp = to_vqp(ibqp);
+ struct pvrdma_rq_wqe_hdr *wqe_hdr;
+ struct ibv_sge *sge;
+ int nreq;
+ int ind;
+ int i;
+ int ret = 0;
+
+ if (!wr || !bad_wr)
+ return EINVAL;
+
+ /*
+ * In the RESET state, we can fail immediately. For other states,
+ * just post and let the device figure it out.
+ */
+ if (ibqp->state == IBV_QPS_RESET) {
+ *bad_wr = wr;
+ return EINVAL;
+ }
+
+ pthread_spin_lock(&qp->rq.lock);
+
+ ind = pvrdma_idx(&(qp->rq.ring_state->prod_tail), qp->rq.wqe_cnt);
+ if (ind < 0) {
+ pthread_spin_unlock(&qp->rq.lock);
+ *bad_wr = wr;
+ return EINVAL;
+ }
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ unsigned int tail;
+
+ if (pvrdma_idx_ring_has_space(qp->rq.ring_state,
+ qp->rq.wqe_cnt, &tail) <= 0) {
+ ret = ENOMEM;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge > qp->rq.max_gs) {
+ ret = EINVAL;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ /* Fetch wqe */
+ wqe_hdr = (struct pvrdma_rq_wqe_hdr *)get_rq_wqe(qp, ind);
+ wqe_hdr->wr_id = wr->wr_id;
+ wqe_hdr->num_sge = wr->num_sge;
+
+ sge = (struct ibv_sge *)(wqe_hdr + 1);
+ for (i = 0; i < wr->num_sge; ++i) {
+ sge->addr = (uint64_t)wr->sg_list[i].addr;
+ sge->length = wr->sg_list[i].length;
+ sge->lkey = wr->sg_list[i].lkey;
+ sge++;
+ }
+
+ pvrdma_idx_ring_inc(&qp->rq.ring_state->prod_tail,
+ qp->rq.wqe_cnt);
+
+ qp->rq.wrid[ind] = wr->wr_id;
+ ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+ }
+
+out:
+ if (nreq)
+ pvrdma_write_uar_qp(ctx->uar,
+ PVRDMA_UAR_QP_RECV | ibqp->qp_num);
+
+ pthread_spin_unlock(&qp->rq.lock);
+ return ret;
+}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 3/8] libpvrdma: Add completion queue functions
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Add support for completion queue creation, destruction, polling and
events.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
providers/pvrdma/cq.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 287 insertions(+)
create mode 100644 providers/pvrdma/cq.c
diff --git a/providers/pvrdma/cq.c b/providers/pvrdma/cq.c
new file mode 100644
index 0000000..f99873c
--- /dev/null
+++ b/providers/pvrdma/cq.c
@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <infiniband/arch.h>
+
+#include "pvrdma.h"
+
+enum {
+ CQ_OK = 0,
+ CQ_EMPTY = -1,
+ CQ_POLL_ERR = -2,
+};
+
+enum {
+ PVRDMA_CQE_IS_SEND_MASK = 0x40,
+ PVRDMA_CQE_OPCODE_MASK = 0x1f,
+};
+
+int pvrdma_alloc_cq_buf(struct pvrdma_device *dev, struct pvrdma_cq *cq,
+ struct pvrdma_buf *buf, int entries)
+{
+ if (pvrdma_alloc_buf(buf, cq->offset +
+ entries * (sizeof(struct pvrdma_cqe)),
+ dev->page_size))
+ return -1;
+ memset(buf->buf, 0, buf->length);
+
+ return 0;
+}
+
+static struct pvrdma_cqe *get_cqe(struct pvrdma_cq *cq, int entry)
+{
+ return cq->buf.buf + cq->offset +
+ entry * (sizeof(struct pvrdma_cqe));
+}
+
+static int pvrdma_poll_one(struct pvrdma_cq *cq,
+ struct pvrdma_qp **cur_qp,
+ struct ibv_wc *wc)
+{
+ struct pvrdma_context *ctx = to_vctx(cq->ibv_cq.context);
+ int has_data;
+ unsigned int head;
+ int tried = 0;
+ struct pvrdma_cqe *cqe;
+
+retry:
+ has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
+ cq->cqe_cnt, &head);
+ if (has_data == 0) {
+ unsigned int val;
+
+ if (tried)
+ return CQ_EMPTY;
+
+ /* Pass down POLL to give physical HCA a chance to poll. */
+ val = cq->cqn | PVRDMA_UAR_CQ_POLL;
+ pvrdma_write_uar_cq(ctx->uar, val);
+
+ tried = 1;
+ goto retry;
+ } else if (has_data == -1) {
+ return CQ_POLL_ERR;
+ }
+
+ cqe = get_cqe(cq, head);
+ if (!cqe)
+ return CQ_EMPTY;
+
+ rmb();
+
+ if (ctx->qp_tbl[cqe->qp & 0xFFFF])
+ *cur_qp = (struct pvrdma_qp *)ctx->qp_tbl[cqe->qp & 0xFFFF];
+ else
+ return CQ_POLL_ERR;
+
+ wc->opcode = pvrdma_wc_opcode_to_ibv(cqe->opcode);
+ wc->status = pvrdma_wc_status_to_ibv(cqe->status);
+ wc->wr_id = cqe->wr_id;
+ wc->qp_num = (*cur_qp)->ibv_qp.qp_num;
+ wc->byte_len = cqe->byte_len;
+ wc->imm_data = cqe->imm_data;
+ wc->src_qp = cqe->src_qp;
+ wc->wc_flags = cqe->wc_flags;
+ wc->pkey_index = cqe->pkey_index;
+ wc->slid = cqe->slid;
+ wc->sl = cqe->sl;
+ wc->dlid_path_bits = cqe->dlid_path_bits;
+ wc->vendor_err = 0;
+
+ /* Update shared ring state. */
+ pvrdma_idx_ring_inc(&(cq->ring_state->rx.cons_head), cq->cqe_cnt);
+
+ return CQ_OK;
+}
+
+int pvrdma_poll_cq(struct ibv_cq *ibcq, int num_entries, struct ibv_wc *wc)
+{
+ struct pvrdma_cq *cq = to_vcq(ibcq);
+ struct pvrdma_qp *qp;
+ int npolled = 0;
+
+ if (num_entries < 1 || wc == NULL)
+ return 0;
+
+ pthread_spin_lock(&cq->lock);
+
+ for (npolled = 0; npolled < num_entries; ++npolled) {
+ if (pvrdma_poll_one(cq, &qp, wc + npolled) != CQ_OK)
+ break;
+ }
+
+ pthread_spin_unlock(&cq->lock);
+
+ return npolled;
+}
+
+void __pvrdma_cq_clean(struct pvrdma_cq *cq, uint32_t qpn)
+{
+ /* Flush CQEs from specified QP */
+ int has_data;
+ unsigned int head;
+
+ /* Lock held */
+ has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
+ cq->cqe_cnt, &head);
+
+ if (unlikely(has_data > 0)) {
+ int items;
+ int curr;
+ int tail = pvrdma_idx(&cq->ring_state->rx.prod_tail,
+ cq->cqe_cnt);
+ struct pvrdma_cqe *cqe;
+ struct pvrdma_cqe *curr_cqe;
+
+ items = (tail > head) ? (tail - head) :
+ (cq->cqe_cnt - head + tail);
+ curr = --tail;
+ while (items-- > 0) {
+ if (curr < 0)
+ curr = cq->cqe_cnt - 1;
+ if (tail < 0)
+ tail = cq->cqe_cnt - 1;
+ curr_cqe = get_cqe(cq, curr);
+ rmb();
+ if ((curr_cqe->qp & 0xFFFF) != qpn) {
+ if (curr != tail) {
+ cqe = get_cqe(cq, tail);
+ rmb();
+ *cqe = *curr_cqe;
+ }
+ tail--;
+ } else {
+ pvrdma_idx_ring_inc(
+ &cq->ring_state->rx.cons_head,
+ cq->cqe_cnt);
+ }
+ curr--;
+ }
+ }
+}
+
+void pvrdma_cq_clean(struct pvrdma_cq *cq, uint32_t qpn)
+{
+ pthread_spin_lock(&cq->lock);
+ __pvrdma_cq_clean(cq, qpn);
+ pthread_spin_unlock(&cq->lock);
+}
+
+struct ibv_cq *pvrdma_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector)
+{
+ struct pvrdma_device *dev = to_vdev(context->device);
+ struct pvrdma_create_cq cmd;
+ struct pvrdma_create_cq_resp resp;
+ struct pvrdma_cq *cq;
+ int ret;
+
+ if (cqe < 1)
+ return NULL;
+
+ cq = malloc(sizeof(*cq));
+ if (!cq)
+ return NULL;
+
+ /* Extra page for shared ring state */
+ cq->offset = dev->page_size;
+
+ if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
+ goto err;
+
+ cqe = align_next_power2(cqe);
+
+ if (pvrdma_alloc_cq_buf(dev, cq, &cq->buf, cqe))
+ goto err;
+
+ cq->ring_state = cq->buf.buf;
+
+ cmd.buf_addr = (uintptr_t) cq->buf.buf;
+ cmd.buf_size = cq->buf.length;
+ ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
+ &cq->ibv_cq, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (ret)
+ goto err_buf;
+
+ cq->cqn = resp.cqn;
+ cq->cqe_cnt = cq->ibv_cq.cqe;
+
+ return &cq->ibv_cq;
+
+err_buf:
+ pvrdma_free_buf(&cq->buf);
+err:
+ free(cq);
+
+ return NULL;
+}
+
+int pvrdma_destroy_cq(struct ibv_cq *cq)
+{
+ int ret;
+
+ ret = ibv_cmd_destroy_cq(cq);
+ if (ret)
+ return ret;
+
+ pvrdma_free_buf(&to_vcq(cq)->buf);
+ free(to_vcq(cq));
+
+ return 0;
+}
+
+int pvrdma_req_notify_cq(struct ibv_cq *ibcq, int solicited)
+{
+ struct pvrdma_context *ctx = to_vctx(ibcq->context);
+ struct pvrdma_cq *cq = to_vcq(ibcq);
+ unsigned int val = cq->cqn;
+
+ val |= solicited ? PVRDMA_UAR_CQ_ARM_SOL : PVRDMA_UAR_CQ_ARM;
+ pvrdma_write_uar_cq(ctx->uar, val);
+
+ return 0;
+}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 2/8] libpvrdma: Add ring traversal
From: Adit Ranadive @ 2016-11-03 23:44 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA
Cc: Adit Ranadive
In-Reply-To: <1478216677-6150-1-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
CQs and QPs use these structures to traverse the CQE/WQE rings.
Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
---
providers/pvrdma/pvrdma_ring.h | 136 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 136 insertions(+)
create mode 100644 providers/pvrdma/pvrdma_ring.h
diff --git a/providers/pvrdma/pvrdma_ring.h b/providers/pvrdma/pvrdma_ring.h
new file mode 100644
index 0000000..e99a551
--- /dev/null
+++ b/providers/pvrdma/pvrdma_ring.h
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program in the file COPYING. If not, write to the
+ * Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+ * Boston, MA 02110-1301, USA.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __PVRDMA_RING_H__
+#define __PVRDMA_RING_H__
+
+#include <linux/types.h>
+
+#define PVRDMA_INVALID_IDX -1 /* Invalid index. */
+#define atomic_read(_x) *(_x)
+#define atomic_set(_x, _y) (*(_x) = (_y))
+
+typedef uint32_t atomic_t;
+
+struct pvrdma_ring {
+ atomic_t prod_tail; /* Producer tail. */
+ atomic_t cons_head; /* Consumer head. */
+};
+
+struct pvrdma_ring_state {
+ struct pvrdma_ring tx; /* Tx ring. */
+ struct pvrdma_ring rx; /* Rx ring. */
+};
+
+static inline int pvrdma_idx_valid(__u32 idx, __u32 max_elems)
+{
+ /* Generates fewer instructions than a less-than. */
+ return (idx & ~((max_elems << 1) - 1)) == 0;
+}
+
+static inline __s32 pvrdma_idx(atomic_t *var, __u32 max_elems)
+{
+ const unsigned idx = atomic_read(var);
+
+ if (pvrdma_idx_valid(idx, max_elems))
+ return idx & (max_elems - 1);
+ return PVRDMA_INVALID_IDX;
+}
+
+static inline void pvrdma_idx_ring_inc(atomic_t *var, __u32 max_elems)
+{
+ __u32 idx = atomic_read(var) + 1; /* Increment. */
+
+ idx &= (max_elems << 1) - 1; /* Modulo size, flip gen. */
+ atomic_set(var, idx);
+}
+
+static inline __s32 pvrdma_idx_ring_has_space(const struct pvrdma_ring *r,
+ __u32 max_elems, __u32 *out_tail)
+{
+ const __u32 tail = atomic_read(&r->prod_tail);
+ const __u32 head = atomic_read(&r->cons_head);
+
+ if (pvrdma_idx_valid(tail, max_elems) &&
+ pvrdma_idx_valid(head, max_elems)) {
+ *out_tail = tail & (max_elems - 1);
+ return tail != (head ^ max_elems);
+ }
+ return PVRDMA_INVALID_IDX;
+}
+
+static inline __s32 pvrdma_idx_ring_has_data(const struct pvrdma_ring *r,
+ __u32 max_elems, __u32 *out_head)
+{
+ const __u32 tail = atomic_read(&r->prod_tail);
+ const __u32 head = atomic_read(&r->cons_head);
+
+ if (pvrdma_idx_valid(tail, max_elems) &&
+ pvrdma_idx_valid(head, max_elems)) {
+ *out_head = head & (max_elems - 1);
+ return tail != head;
+ }
+ return PVRDMA_INVALID_IDX;
+}
+
+static inline __s32 pvrdma_idx_ring_is_valid_idx(const struct pvrdma_ring *r,
+ __u32 max_elems, __u32 *idx)
+{
+ const __u32 tail = atomic_read(&r->prod_tail);
+ const __u32 head = atomic_read(&r->cons_head);
+
+ if (pvrdma_idx_valid(tail, max_elems) &&
+ pvrdma_idx_valid(head, max_elems) &&
+ pvrdma_idx_valid(*idx, max_elems)) {
+ if (tail > head && (*idx < tail && *idx >= head))
+ return 1;
+ else if (head > tail && (*idx >= head || *idx < tail))
+ return 1;
+ }
+ return 0;
+}
+
+#endif /* __PVRDMA_RING_H__ */
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox