* [patch] qedr: Unlock on error in qedr_post_send()
From: Dan Carpenter @ 2016-11-15 9:51 UTC (permalink / raw)
To: Doug Ledford, Ram Amrani
Cc: Sean Hefty, Hal Rosenstock, Rajesh Borundia, linux-rdma,
kernel-janitors
If "wr" is NULL then we return without unlocking. I added an unlock
label and changed the direct returns to gotos.
Fixes: afa0e13be754 ("qedr: Add support for data path")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index a615142..01d05b0 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2973,17 +2973,18 @@ int qedr_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
if ((qp->state == QED_ROCE_QP_STATE_RESET) ||
(qp->state == QED_ROCE_QP_STATE_ERR)) {
- spin_unlock_irqrestore(&qp->q_lock, flags);
*bad_wr = wr;
DP_DEBUG(dev, QEDR_MSG_CQ,
"QP in wrong state! QP icid=0x%x state %d\n",
qp->icid, qp->state);
- return -EINVAL;
+ rc = -EINVAL;
+ goto unlock;
}
if (!wr) {
DP_ERR(dev, "Got an empty post send.\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto unlock;
}
while (wr) {
@@ -3012,6 +3013,7 @@ int qedr_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
/* Make sure write sticks */
mmiowb();
+unlock:
spin_unlock_irqrestore(&qp->q_lock, flags);
return rc;
^ permalink raw reply related
* Re: [PATCH] IB/mlx4: Rework special QP creation error path
From: Yishai Hadas @ 2016-11-15 8:48 UTC (permalink / raw)
To: Bart Van Assche
Cc: Doug Ledford, Yishai Hadas,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <3a41daa7-cfb8-7374-5dab-dfa3b1bf83ff-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
On 11/14/2016 6:44 PM, Bart Van Assche wrote:
> The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
> qp) == 0. Remove this assumption because that makes the QP creation
> code easier to understand.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
> drivers/infiniband/hw/mlx4/qp.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index 570bc86..b63d6be 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -644,7 +644,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
> int qpn;
> int err;
> struct ib_qp_cap backup_cap;
> - struct mlx4_ib_sqp *sqp;
> + struct mlx4_ib_sqp *sqp = NULL;
> struct mlx4_ib_qp *qp;
> enum mlx4_ib_qp_type qp_type = (enum mlx4_ib_qp_type) init_attr->qp_type;
> struct mlx4_ib_cq *mcq;
> @@ -933,7 +933,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
> mlx4_db_free(dev->dev, &qp->db);
>
> err:
> - if (!*caller_qp)
> + if (sqp)
> + kfree(sqp);
> + else if (!*caller_qp)
> kfree(qp);
> return err;
> }
>
Looks fine, thanks.
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH libibmad] configure.ac: Update AM_INIT_AUTOMAKE to use subdir-objects
From: Hal Rosenstock @ 2016-11-14 21:34 UTC (permalink / raw)
To: Weiny, Ira; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
With this option, the objects are placed into the subdirectory of
the build directory corresponding to the subdirectory of the source file.
Fixes the following warning:
Makefile.am:16: warning: source file 'src/dump.c' is in a subdirectory,
Makefile.am:16: but option 'subdir-objects' is disabled
automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled. For now, the corresponding output
automake: object file(s) will be placed in the top-level directory. However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.
Signed-off-by: Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
diff --git a/configure.ac b/configure.ac
index 843dbfd..6453e11 100644
--- a/configure.ac
+++ b/configure.ac
@@ -6,7 +6,7 @@ AC_CONFIG_SRCDIR([src/sa.c])
AC_CONFIG_AUX_DIR(config)
AC_CONFIG_MACRO_DIR(config)
AM_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE
+AM_INIT_AUTOMAKE([subdir-objects])
AC_SUBST(RELEASE, ${RELEASE:-unknown})
AC_SUBST(TARBALL, ${TARBALL:-${PACKAGE}-${VERSION}.tar.gz})
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH] IB/mlx4: Rework special QP creation error path
From: Laurence Oberman @ 2016-11-14 19:33 UTC (permalink / raw)
To: Bart Van Assche
Cc: Doug Ledford, Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <3a41daa7-cfb8-7374-5dab-dfa3b1bf83ff-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: "Doug Ledford" <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Yishai Hadas" <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Monday, November 14, 2016 11:44:11 AM
> Subject: [PATCH] IB/mlx4: Rework special QP creation error path
>
> The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
> qp) == 0. Remove this assumption because that makes the QP creation
> code easier to understand.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
> drivers/infiniband/hw/mlx4/qp.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/qp.c
> b/drivers/infiniband/hw/mlx4/qp.c
> index 570bc86..b63d6be 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -644,7 +644,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev,
> struct ib_pd *pd,
> int qpn;
> int err;
> struct ib_qp_cap backup_cap;
> - struct mlx4_ib_sqp *sqp;
> + struct mlx4_ib_sqp *sqp = NULL;
> struct mlx4_ib_qp *qp;
> enum mlx4_ib_qp_type qp_type = (enum mlx4_ib_qp_type) init_attr->qp_type;
> struct mlx4_ib_cq *mcq;
> @@ -933,7 +933,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev,
> struct ib_pd *pd,
> mlx4_db_free(dev->dev, &qp->db);
>
> err:
> - if (!*caller_qp)
> + if (sqp)
> + kfree(sqp);
> + else if (!*caller_qp)
> kfree(qp);
> return err;
> }
> --
> 2.10.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
This looks fine to me.
Reviewed-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH rdma-next 6/6] IB/mlx5: Add support to match inner packet fields
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add support to match packet fields which are tunneled,
i.e. support matching the header of the inner packet which is the result of
or bit operation of the original header and the IB_FLOW_SPEC_INNER type.
The combination of IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL is not
needed to be checked, because the IB core has this check already.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 131 +++++++++++++++++++++++---------------
1 file changed, 78 insertions(+), 53 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a833f45..65eb5e0 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1502,6 +1502,22 @@ static void set_proto(void *outer_c, void *outer_v, u8 mask, u8 val)
MLX5_SET(fte_match_set_lyr_2_4, outer_v, ip_protocol, val);
}
+static void set_flow_label(void *misc_c, void *misc_v, u8 mask, u8 val,
+ bool inner)
+{
+ if (inner) {
+ MLX5_SET(fte_match_set_misc,
+ misc_c, inner_ipv6_flow_label, mask);
+ MLX5_SET(fte_match_set_misc,
+ misc_v, inner_ipv6_flow_label, val);
+ } else {
+ MLX5_SET(fte_match_set_misc,
+ misc_c, outer_ipv6_flow_label, mask);
+ MLX5_SET(fte_match_set_misc,
+ misc_v, outer_ipv6_flow_label, val);
+ }
+}
+
static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
{
MLX5_SET(fte_match_set_lyr_2_4, outer_c, ip_ecn, mask);
@@ -1528,155 +1544,164 @@ static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
static int parse_flow_attr(u32 *match_c, u32 *match_v,
const union ib_flow_spec *ib_spec)
{
- void *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
- outer_headers);
- void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
- outer_headers);
void *misc_params_c = MLX5_ADDR_OF(fte_match_param, match_c,
misc_parameters);
void *misc_params_v = MLX5_ADDR_OF(fte_match_param, match_v,
misc_parameters);
+ void *headers_c;
+ void *headers_v;
+
+ if (ib_spec->type & IB_FLOW_SPEC_INNER) {
+ headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
+ inner_headers);
+ headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
+ inner_headers);
+ } else {
+ headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
+ outer_headers);
+ headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
+ outer_headers);
+ }
- switch (ib_spec->type) {
+ switch (ib_spec->type & ~IB_FLOW_SPEC_INNER) {
case IB_FLOW_SPEC_ETH:
if (FIELDS_NOT_SUPPORTED(ib_spec->eth.mask, LAST_ETH_FIELD))
return -ENOTSUPP;
- ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
dmac_47_16),
ib_spec->eth.mask.dst_mac);
- ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
dmac_47_16),
ib_spec->eth.val.dst_mac);
- ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
smac_47_16),
ib_spec->eth.mask.src_mac);
- ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
smac_47_16),
ib_spec->eth.val.src_mac);
if (ib_spec->eth.mask.vlan_tag) {
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
vlan_tag, 1);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
vlan_tag, 1);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
first_vid, ntohs(ib_spec->eth.mask.vlan_tag));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
first_vid, ntohs(ib_spec->eth.val.vlan_tag));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
first_cfi,
ntohs(ib_spec->eth.mask.vlan_tag) >> 12);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
first_cfi,
ntohs(ib_spec->eth.val.vlan_tag) >> 12);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
first_prio,
ntohs(ib_spec->eth.mask.vlan_tag) >> 13);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
first_prio,
ntohs(ib_spec->eth.val.vlan_tag) >> 13);
}
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
ethertype, ntohs(ib_spec->eth.mask.ether_type));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
ethertype, ntohs(ib_spec->eth.val.ether_type));
break;
case IB_FLOW_SPEC_IPV4:
if (FIELDS_NOT_SUPPORTED(ib_spec->ipv4.mask, LAST_IPV4_FIELD))
return -ENOTSUPP;
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
ethertype, 0xffff);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
ethertype, ETH_P_IP);
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
src_ipv4_src_ipv6.ipv4_layout.ipv4),
&ib_spec->ipv4.mask.src_ip,
sizeof(ib_spec->ipv4.mask.src_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
src_ipv4_src_ipv6.ipv4_layout.ipv4),
&ib_spec->ipv4.val.src_ip,
sizeof(ib_spec->ipv4.val.src_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
&ib_spec->ipv4.mask.dst_ip,
sizeof(ib_spec->ipv4.mask.dst_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
&ib_spec->ipv4.val.dst_ip,
sizeof(ib_spec->ipv4.val.dst_ip));
- set_tos(outer_headers_c, outer_headers_v,
+ set_tos(headers_c, headers_v,
ib_spec->ipv4.mask.tos, ib_spec->ipv4.val.tos);
- set_proto(outer_headers_c, outer_headers_v,
+ set_proto(headers_c, headers_v,
ib_spec->ipv4.mask.proto, ib_spec->ipv4.val.proto);
break;
case IB_FLOW_SPEC_IPV6:
if (FIELDS_NOT_SUPPORTED(ib_spec->ipv6.mask, LAST_IPV6_FIELD))
return -ENOTSUPP;
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c,
ethertype, 0xffff);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v,
ethertype, ETH_P_IPV6);
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
src_ipv4_src_ipv6.ipv6_layout.ipv6),
&ib_spec->ipv6.mask.src_ip,
sizeof(ib_spec->ipv6.mask.src_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
src_ipv4_src_ipv6.ipv6_layout.ipv6),
&ib_spec->ipv6.val.src_ip,
sizeof(ib_spec->ipv6.val.src_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
&ib_spec->ipv6.mask.dst_ip,
sizeof(ib_spec->ipv6.mask.dst_ip));
- memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+ memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
&ib_spec->ipv6.val.dst_ip,
sizeof(ib_spec->ipv6.val.dst_ip));
- set_tos(outer_headers_c, outer_headers_v,
+ set_tos(headers_c, headers_v,
ib_spec->ipv6.mask.traffic_class,
ib_spec->ipv6.val.traffic_class);
- set_proto(outer_headers_c, outer_headers_v,
+ set_proto(headers_c, headers_v,
ib_spec->ipv6.mask.next_hdr,
ib_spec->ipv6.val.next_hdr);
- MLX5_SET(fte_match_set_misc, misc_params_c,
- outer_ipv6_flow_label,
- ntohl(ib_spec->ipv6.mask.flow_label));
- MLX5_SET(fte_match_set_misc, misc_params_v,
- outer_ipv6_flow_label,
- ntohl(ib_spec->ipv6.val.flow_label));
+ set_flow_label(misc_params_c, misc_params_v,
+ ntohl(ib_spec->ipv6.mask.flow_label),
+ ntohl(ib_spec->ipv6.val.flow_label),
+ ib_spec->type & IB_FLOW_SPEC_INNER);
+
break;
case IB_FLOW_SPEC_TCP:
if (FIELDS_NOT_SUPPORTED(ib_spec->tcp_udp.mask,
LAST_TCP_UDP_FIELD))
return -ENOTSUPP;
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, ip_protocol,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol,
0xff);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, ip_protocol,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
IPPROTO_TCP);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, tcp_sport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, tcp_sport,
ntohs(ib_spec->tcp_udp.mask.src_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, tcp_sport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, tcp_sport,
ntohs(ib_spec->tcp_udp.val.src_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, tcp_dport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, tcp_dport,
ntohs(ib_spec->tcp_udp.mask.dst_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, tcp_dport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, tcp_dport,
ntohs(ib_spec->tcp_udp.val.dst_port));
break;
case IB_FLOW_SPEC_UDP:
@@ -1684,19 +1709,19 @@ static int parse_flow_attr(u32 *match_c, u32 *match_v,
LAST_TCP_UDP_FIELD))
return -ENOTSUPP;
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, ip_protocol,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol,
0xff);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, ip_protocol,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
IPPROTO_UDP);
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, udp_sport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, udp_sport,
ntohs(ib_spec->tcp_udp.mask.src_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, udp_sport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, udp_sport,
ntohs(ib_spec->tcp_udp.val.src_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c, udp_dport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, udp_dport,
ntohs(ib_spec->tcp_udp.mask.dst_port));
- MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, udp_dport,
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, udp_dport,
ntohs(ib_spec->tcp_udp.val.dst_port));
break;
case IB_FLOW_SPEC_VXLAN_TUNNEL:
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 5/6] IB/core: Introduce inner flow steering
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
For a tunneled packet which contains external and internal headers,
we refer to the external headers as "outer fields" and the internal
headers as "inner fields".
Example of a tunneled packet:
{ L2 | L3 | L4 | tunnel header | L2 | L3 | l4 | data }
| | | | | | |
{ outer fields }{ inner fields }
This patch introduces a new flag for flow steering rules
- IB_FLOW_SPEC_INNER - which specifies that the rule applies
to the inner fields, rather than to the outer fields of the packet.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/uverbs_cmd.c | 4 +++-
include/rdma/ib_verbs.h | 17 +++++++++--------
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 561010a..10fb325 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3124,8 +3124,10 @@ static int kern_spec_to_ib_spec(struct ib_uverbs_flow_spec *kern_spec,
kern_spec_val = (void *)kern_spec +
sizeof(struct ib_uverbs_flow_spec_hdr);
kern_spec_mask = kern_spec_val + kern_filter_sz;
+ if (ib_spec->type == (IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL))
+ return -EINVAL;
- switch (ib_spec->type) {
+ switch (ib_spec->type & ~IB_FLOW_SPEC_INNER) {
case IB_FLOW_SPEC_ETH:
ib_filter_sz = offsetof(struct ib_flow_eth_filter, real_sz);
actual_filter_sz = spec_filter_size(kern_spec_mask,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 81c5c80..195a03e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1601,9 +1601,10 @@ enum ib_flow_spec_type {
IB_FLOW_SPEC_TCP = 0x40,
IB_FLOW_SPEC_UDP = 0x41,
IB_FLOW_SPEC_VXLAN_TUNNEL = 0x50,
+ IB_FLOW_SPEC_INNER = 0x100,
};
#define IB_FLOW_SPEC_LAYER_MASK 0xF0
-#define IB_FLOW_SPEC_SUPPORT_LAYERS 4
+#define IB_FLOW_SPEC_SUPPORT_LAYERS 8
/* Flow steering rule priority is set according to it's domain.
* Lower domain value means higher priority.
@@ -1631,7 +1632,7 @@ struct ib_flow_eth_filter {
};
struct ib_flow_spec_eth {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_eth_filter val;
struct ib_flow_eth_filter mask;
@@ -1645,7 +1646,7 @@ struct ib_flow_ib_filter {
};
struct ib_flow_spec_ib {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_ib_filter val;
struct ib_flow_ib_filter mask;
@@ -1670,7 +1671,7 @@ struct ib_flow_ipv4_filter {
};
struct ib_flow_spec_ipv4 {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_ipv4_filter val;
struct ib_flow_ipv4_filter mask;
@@ -1688,7 +1689,7 @@ struct ib_flow_ipv6_filter {
};
struct ib_flow_spec_ipv6 {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_ipv6_filter val;
struct ib_flow_ipv6_filter mask;
@@ -1702,7 +1703,7 @@ struct ib_flow_tcp_udp_filter {
};
struct ib_flow_spec_tcp_udp {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_tcp_udp_filter val;
struct ib_flow_tcp_udp_filter mask;
@@ -1717,7 +1718,7 @@ struct ib_flow_tunnel_filter {
* the tunnel_id from val has the vni value
*/
struct ib_flow_spec_tunnel {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
struct ib_flow_tunnel_filter val;
struct ib_flow_tunnel_filter mask;
@@ -1725,7 +1726,7 @@ struct ib_flow_spec_tunnel {
union ib_flow_spec {
struct {
- enum ib_flow_spec_type type;
+ u32 type;
u16 size;
};
struct ib_flow_spec_eth eth;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 4/6] IB/mlx5: Support Vxlan tunneling specification
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add support to receive specific Vxlan packet in ConnectX-4.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2217477..a833f45 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1515,6 +1515,7 @@ static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
#define LAST_IPV4_FIELD tos
#define LAST_IPV6_FIELD traffic_class
#define LAST_TCP_UDP_FIELD src_port
+#define LAST_TUNNEL_FIELD tunnel_id
/* Field is the last supported field */
#define FIELDS_NOT_SUPPORTED(filter, field)\
@@ -1698,6 +1699,16 @@ static int parse_flow_attr(u32 *match_c, u32 *match_v,
MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v, udp_dport,
ntohs(ib_spec->tcp_udp.val.dst_port));
break;
+ case IB_FLOW_SPEC_VXLAN_TUNNEL:
+ if (FIELDS_NOT_SUPPORTED(ib_spec->tunnel.mask,
+ LAST_TUNNEL_FIELD))
+ return -ENOTSUPP;
+
+ MLX5_SET(fte_match_set_misc, misc_params_c, vxlan_vni,
+ ntohl(ib_spec->tunnel.mask.tunnel_id));
+ MLX5_SET(fte_match_set_misc, misc_params_v, vxlan_vni,
+ ntohl(ib_spec->tunnel.val.tunnel_id));
+ break;
default:
return -EINVAL;
}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 3/6] IB/uverbs: Add support for Vxlan protocol
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add ib_uverbs_flow_spec_tunnel to define the rule to match Vxlan,
the type, size, reserved fields are identical to rest of the protocols,
and are used to identify the spec.
The tunnel id is the vni value of the Vxlan protocol, and it is used
as part of the steering rule, it is limited by the mask.
The steering rule configured on the hardware does a match
according to vni and other protocols.
In the same way as rest of the protocols that we match,
the uniq field's of each protocol are represented on
the val and the mask.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/uapi/rdma/ib_user_verbs.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25225eb..90ba5e8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -908,6 +908,23 @@ struct ib_uverbs_flow_spec_ipv6 {
struct ib_uverbs_flow_ipv6_filter mask;
};
+struct ib_uverbs_flow_tunnel_filter {
+ __be32 tunnel_id;
+};
+
+struct ib_uverbs_flow_spec_tunnel {
+ union {
+ struct ib_uverbs_flow_spec_hdr hdr;
+ struct {
+ __u32 type;
+ __u16 size;
+ __u16 reserved;
+ };
+ };
+ struct ib_uverbs_flow_tunnel_filter val;
+ struct ib_uverbs_flow_tunnel_filter mask;
+};
+
struct ib_uverbs_flow_attr {
__u32 type;
__u16 size;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 2/6] IB/core: Align structure ib_flow_spec_type
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Aligned the structure ib_flow_spec_type indentation,
after adding a new definition.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/rdma/ib_verbs.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9dd8309..81c5c80 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1592,14 +1592,14 @@ enum ib_flow_attr_type {
/* Supported steering header types */
enum ib_flow_spec_type {
/* L2 headers*/
- IB_FLOW_SPEC_ETH = 0x20,
- IB_FLOW_SPEC_IB = 0x22,
+ IB_FLOW_SPEC_ETH = 0x20,
+ IB_FLOW_SPEC_IB = 0x22,
/* L3 header*/
- IB_FLOW_SPEC_IPV4 = 0x30,
- IB_FLOW_SPEC_IPV6 = 0x31,
+ IB_FLOW_SPEC_IPV4 = 0x30,
+ IB_FLOW_SPEC_IPV6 = 0x31,
/* L4 headers*/
- IB_FLOW_SPEC_TCP = 0x40,
- IB_FLOW_SPEC_UDP = 0x41,
+ IB_FLOW_SPEC_TCP = 0x40,
+ IB_FLOW_SPEC_UDP = 0x41,
IB_FLOW_SPEC_VXLAN_TUNNEL = 0x50,
};
#define IB_FLOW_SPEC_LAYER_MASK 0xF0
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 1/6] IB/core: Add flow spec tunneling support
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb,
Moses Reuben
In-Reply-To: <1479143092-11723-1-git-send-email-leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
In order to support tunneling, that can be used by the QP,
both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be
used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc).
IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this
functionality and match specific Vxlan packets.
In similar to IPv6, we check overflow of the vni value by
comparing with the maximum size.
Signed-off-by: Moses Reuben <mosesr-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/uverbs_cmd.c | 15 +++++++++++++++
include/rdma/ib_verbs.h | 19 ++++++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..561010a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3175,6 +3175,21 @@ static int kern_spec_to_ib_spec(struct ib_uverbs_flow_spec *kern_spec,
memcpy(&ib_spec->tcp_udp.val, kern_spec_val, actual_filter_sz);
memcpy(&ib_spec->tcp_udp.mask, kern_spec_mask, actual_filter_sz);
break;
+ case IB_FLOW_SPEC_VXLAN_TUNNEL:
+ ib_filter_sz = offsetof(struct ib_flow_tunnel_filter, real_sz);
+ actual_filter_sz = spec_filter_size(kern_spec_mask,
+ kern_filter_sz,
+ ib_filter_sz);
+ if (actual_filter_sz <= 0)
+ return -EINVAL;
+ ib_spec->tunnel.size = sizeof(struct ib_flow_spec_tunnel);
+ memcpy(&ib_spec->tunnel.val, kern_spec_val, actual_filter_sz);
+ memcpy(&ib_spec->tunnel.mask, kern_spec_mask, actual_filter_sz);
+
+ if ((ntohl(ib_spec->tunnel.mask.tunnel_id)) >= BIT(24) ||
+ (ntohl(ib_spec->tunnel.val.tunnel_id)) >= BIT(24))
+ return -EINVAL;
+ break;
default:
return -EINVAL;
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..9dd8309 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1599,7 +1599,8 @@ enum ib_flow_spec_type {
IB_FLOW_SPEC_IPV6 = 0x31,
/* L4 headers*/
IB_FLOW_SPEC_TCP = 0x40,
- IB_FLOW_SPEC_UDP = 0x41
+ IB_FLOW_SPEC_UDP = 0x41,
+ IB_FLOW_SPEC_VXLAN_TUNNEL = 0x50,
};
#define IB_FLOW_SPEC_LAYER_MASK 0xF0
#define IB_FLOW_SPEC_SUPPORT_LAYERS 4
@@ -1707,6 +1708,21 @@ struct ib_flow_spec_tcp_udp {
struct ib_flow_tcp_udp_filter mask;
};
+struct ib_flow_tunnel_filter {
+ __be32 tunnel_id;
+ u8 real_sz[0];
+};
+
+/* ib_flow_spec_tunnel describes the Vxlan tunnel
+ * the tunnel_id from val has the vni value
+ */
+struct ib_flow_spec_tunnel {
+ enum ib_flow_spec_type type;
+ u16 size;
+ struct ib_flow_tunnel_filter val;
+ struct ib_flow_tunnel_filter mask;
+};
+
union ib_flow_spec {
struct {
enum ib_flow_spec_type type;
@@ -1717,6 +1733,7 @@ union ib_flow_spec {
struct ib_flow_spec_ipv4 ipv4;
struct ib_flow_spec_tcp_udp tcp_udp;
struct ib_flow_spec_ipv6 ipv6;
+ struct ib_flow_spec_tunnel tunnel;
};
struct ib_flow_attr {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 0/6] Add support for Vxlan protocol for the steering rules
From: Leon Romanovsky @ 2016-11-14 17:04 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky, Maor Gottlieb
From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Virtual Extensible LAN - Vxlan is a proposed encapsulation protocol for
running an overlay network on existing Layer 3 infrastructure. An
overlay network is a virtual network that is built on top of existing
network Layer 2 and Layer 3 technologies to support elastic COMPUTE
architectures. Vxlan will make it easier for network engineers to scale
out a cloud computing environment while logically isolating cloud apps
and tenants.
This patchset introduces support Vxlan steering rules to IB/core and
mlx5 devices implementation. The application will be able to send
packet with Vxlan tunneling, and configure a steering rule to receive
the stream from the rule to the QP.
The rule will contain inner and outer parts for the inner and outer
protocol header respectively. The matching is based on the rule
specification and it includes the Vxlan header itself, so application
will be able to send several Vxlan streams with a new vni for each stream.
In order to steer traffic according to a Vxlan tunnel ID, there is a need to
use the ibv_cmd_create_flow call. At the end of the struct, the application
should append several specs for each feature.
The number of specifications to add is 4. Use the first as the Vxlan
spec (tunnel), which is described below. The following specifications to
add are the L2, L3, and L4 specs.
There is an option to mask a range of tunnel IDs to be steered by using
the filler mask, This following example provides the ibv_flow_tunnel_filter
struct definition:
struct ibv_flow_tunnel_filter {
uint32_t tunnel_id;
};
This following example specifies the Vxlan tunnel definition:
struct ibv_flow_spec_tunnel {
enum ibv_flow_spec_type type;
uint16_t size;
struct ibv_flow_tunnel_filter val;
struct ibv_flow_tunnel_filter mask;
};
Where:
* type - Specifies that there is a need to use IBV_FLOW_SPEC_VXLAN_TUNNEL
to define the spec as an Vxlan spec.
* size - Identifies the size of the ibv_kern_spec_tunnel.
* tunnel_id: Indicates that the tunnel ID as in the Vxlan header.
Thanks,
Mosses and Leon.
Available in the "topic/vxlan" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/vxlan
CC: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Moses Reuben (6):
IB/core: Add flow spec tunneling support
IB/core: Align structure ib_flow_spec_type
IB/uverbs: Add support for Vxlan protocol
IB/mlx5: Support Vxlan tunneling specification
IB/core: Introduce inner flow steering
IB/mlx5: Add support to match inner packet fields
drivers/infiniband/core/uverbs_cmd.c | 19 ++++-
drivers/infiniband/hw/mlx5/main.c | 142 ++++++++++++++++++++++-------------
include/rdma/ib_verbs.h | 44 +++++++----
include/uapi/rdma/ib_user_verbs.h | 17 +++++
4 files changed, 155 insertions(+), 67 deletions(-)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] IB/mlx4: Rework special QP creation error path
From: Bart Van Assche @ 2016-11-14 16:44 UTC (permalink / raw)
To: Doug Ledford
Cc: Yishai Hadas, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
qp) == 0. Remove this assumption because that makes the QP creation
code easier to understand.
Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/hw/mlx4/qp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 570bc86..b63d6be 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -644,7 +644,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
int qpn;
int err;
struct ib_qp_cap backup_cap;
- struct mlx4_ib_sqp *sqp;
+ struct mlx4_ib_sqp *sqp = NULL;
struct mlx4_ib_qp *qp;
enum mlx4_ib_qp_type qp_type = (enum mlx4_ib_qp_type) init_attr->qp_type;
struct mlx4_ib_cq *mcq;
@@ -933,7 +933,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
mlx4_db_free(dev->dev, &qp->db);
err:
- if (!*caller_qp)
+ if (sqp)
+ kfree(sqp);
+ else if (!*caller_qp)
kfree(qp);
return err;
}
--
2.10.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [bug report] qedr: Add support for QP verbs
From: Dan Carpenter @ 2016-11-14 13:03 UTC (permalink / raw)
To: Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hello Ram Amrani,
The patch cecbcddf6461: "qedr: Add support for QP verbs" from Oct 10,
2016, leads to the following static checker warning:
drivers/infiniband/hw/qedr/verbs.c:2067 qedr_destroy_qp()
'0x5 | 0x1' has '0x1' set on both sides
drivers/infiniband/hw/qedr/verbs.c
2056 int qedr_destroy_qp(struct ib_qp *ibqp)
2057 {
2058 struct qedr_qp *qp = get_qedr_qp(ibqp);
2059 struct qedr_dev *dev = qp->dev;
2060 struct ib_qp_attr attr;
2061 int attr_mask = 0;
2062 int rc = 0;
2063
2064 DP_DEBUG(dev, QEDR_MSG_QP, "destroy qp: destroying %p, qp type=%d\n",
2065 qp, qp->qp_type);
2066
2067 if (qp->state != (QED_ROCE_QP_STATE_RESET | QED_ROCE_QP_STATE_ERR |
2068 QED_ROCE_QP_STATE_INIT)) {
These aren't bitfields, they're just numbers. This code is pretty
suspect. Not sure what was intended.
2069 attr.qp_state = IB_QPS_ERR;
2070 attr_mask |= IB_QP_STATE;
2071
2072 /* Change the QP state to ERROR */
2073 qedr_modify_qp(ibqp, &attr, attr_mask, NULL);
2074 }
regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-14 12:05 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161114120153.GC4240-2ukJVAZIZ/Y@public.gmane.org>
> There is addition of 1 in mlx4_ib_create_cq():
> 192 entries = roundup_pow_of_two(entries + 1);
> 193 cq->ibcq.cqe = entries - 1;
I thought something else might hide there.
Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Configuration of cq->cqe is lower than entries by 1
From: Leon Romanovsky @ 2016-11-14 12:01 UTC (permalink / raw)
To: Amrani, Ram; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR07MB22076C5DE03F1939603C554CF8BC0-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]
On Mon, Nov 14, 2016 at 11:07:53AM +0000, Amrani, Ram wrote:
> Hi Leon, All,
> While inspecting MLX code as well as other vendors' I see that the actual number of cq->cqe is configured to be less by 1 than 'entries'. Why is that?
There is addition of 1 in mlx4_ib_create_cq():
192 entries = roundup_pow_of_two(entries + 1);
193 cq->ibcq.cqe = entries - 1;
The same goes for mlx4_alloc_resize_buf, just earlier in the stack.
>
> e.g.
> struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
> const struct ib_cq_init_attr *attr,
> struct ib_ucontext *context,
> struct ib_udata *udata)
> {
> ...
> cq->ibcq.cqe = entries - 1;
> ...
> }
>
>
> static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
> int entries)
> {
> ...
> cq->resize_buf->cqe = entries - 1; // this is later copied to cq->ibcq.cqe
> ...
> }
>
> Thanks,
> Ram
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v6 12/16] IB/pvrdma: Add Queue Pair support
From: Yuval Shaia @ 2016-11-14 11:34 UTC (permalink / raw)
To: Adit Ranadive
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
pv-drivers-pghWNbHTmq7QT0dZR+AlfA, jhansen-pghWNbHTmq7QT0dZR+AlfA,
asarwade-pghWNbHTmq7QT0dZR+AlfA,
georgezhang-pghWNbHTmq7QT0dZR+AlfA,
bryantan-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <6a643e92376856394d45638d80a90619d3abac37.1475458407.git.aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
On Sun, Oct 02, 2016 at 07:10:32PM -0700, Adit Ranadive wrote:
> This patch adds the ability to create, modify, query and destroy QPs. The
> PVRDMA device supports RC, UD and GSI QPs.
>
> Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Jorgen Hansen <jhansen-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: George Zhang <georgezhang-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Aditya Sarwade <asarwade-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Bryan Tan <bryantan-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
> ---
> Changes v5->v6:
> - Removed a duplicate include of the ABI header.
>
> Changes v4->v5:
> - Updated include for headers in UAPI folder.
> - Update to pvrdma_cmd_post for creating/destroying/querying/modifying QPs.
> - Use the pvrdma_sge struct when posting WRs/allocating QP memory.
> - Removed two set but unused variables.
>
> Changes v3->v4:
> - Removed an unnecessary switch case.
> - Unified the returns in pvrdma_create_qp to use one exit point.
> - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we need a lock to
> be held when calling this.
> - Updated to use wrapper for UAR write for QP.
> - Updated conversion function to func_name(dst, src) format.
> - Renamed max_gs to max_sg.
> - Renamed cap variable to req_cap in pvrdma_set_sq/rq_size.
> - Changed dev_warn to dev_warn_ratelimited in pvrdma_post_send/recv.
> - Added nesting locking for flushing CQs when destroying/resetting a QP.
> - Added missing ret value.
>
> Changes v2->v3:
> - Removed boolean in pvrdma_cmd_post.
> ---
> drivers/infiniband/hw/pvrdma/pvrdma_qp.c | 972 +++++++++++++++++++++++++++++++
> 1 file changed, 972 insertions(+)
> create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_qp.c
>
> diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
> new file mode 100644
> index 0000000..c8c01e5
> --- /dev/null
> +++ b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
> @@ -0,0 +1,972 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <asm/page.h>
> +#include <linux/io.h>
> +#include <linux/wait.h>
> +#include <rdma/ib_addr.h>
> +#include <rdma/ib_smi.h>
> +#include <rdma/ib_user_verbs.h>
> +
> +#include "pvrdma.h"
> +
> +static inline void get_cqs(struct pvrdma_qp *qp, struct pvrdma_cq **send_cq,
> + struct pvrdma_cq **recv_cq)
> +{
> + *send_cq = to_vcq(qp->ibqp.send_cq);
> + *recv_cq = to_vcq(qp->ibqp.recv_cq);
> +}
> +
> +static void pvrdma_lock_cqs(struct pvrdma_cq *scq, struct pvrdma_cq *rcq,
> + unsigned long *scq_flags,
> + unsigned long *rcq_flags)
> + __acquires(scq->cq_lock) __acquires(rcq->cq_lock)
> +{
> + if (scq == rcq) {
> + spin_lock_irqsave(&scq->cq_lock, *scq_flags);
> + __acquire(rcq->cq_lock);
> + } else if (scq->cq_handle < rcq->cq_handle) {
> + spin_lock_irqsave(&scq->cq_lock, *scq_flags);
> + spin_lock_irqsave_nested(&rcq->cq_lock, *rcq_flags,
> + SINGLE_DEPTH_NESTING);
> + } else {
> + spin_lock_irqsave(&rcq->cq_lock, *rcq_flags);
> + spin_lock_irqsave_nested(&scq->cq_lock, *scq_flags,
> + SINGLE_DEPTH_NESTING);
> + }
> +}
> +
> +static void pvrdma_unlock_cqs(struct pvrdma_cq *scq, struct pvrdma_cq *rcq,
> + unsigned long *scq_flags,
> + unsigned long *rcq_flags)
> + __releases(scq->cq_lock) __releases(rcq->cq_lock)
> +{
> + if (scq == rcq) {
> + __release(rcq->cq_lock);
> + spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> + } else if (scq->cq_handle < rcq->cq_handle) {
> + spin_unlock_irqrestore(&rcq->cq_lock, *rcq_flags);
> + spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> + } else {
> + spin_unlock_irqrestore(&scq->cq_lock, *scq_flags);
> + spin_unlock_irqrestore(&rcq->cq_lock, *rcq_flags);
> + }
> +}
> +
> +static void pvrdma_reset_qp(struct pvrdma_qp *qp)
> +{
> + struct pvrdma_cq *scq, *rcq;
> + unsigned long scq_flags, rcq_flags;
> +
> + /* Clean up cqes */
> + get_cqs(qp, &scq, &rcq);
> + pvrdma_lock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> + _pvrdma_flush_cqe(qp, scq);
> + if (scq != rcq)
> + _pvrdma_flush_cqe(qp, rcq);
> +
> + pvrdma_unlock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> + /*
> + * Reset queuepair. The checks are because usermode queuepairs won't
> + * have kernel ringstates.
> + */
> + if (qp->rq.ring) {
> + atomic_set(&qp->rq.ring->cons_head, 0);
> + atomic_set(&qp->rq.ring->prod_tail, 0);
> + }
> + if (qp->sq.ring) {
> + atomic_set(&qp->sq.ring->cons_head, 0);
> + atomic_set(&qp->sq.ring->prod_tail, 0);
> + }
> +}
> +
> +static int pvrdma_set_rq_size(struct pvrdma_dev *dev,
> + struct ib_qp_cap *req_cap,
> + struct pvrdma_qp *qp)
> +{
> + if (req_cap->max_recv_wr > dev->dsr->caps.max_qp_wr ||
> + req_cap->max_recv_sge > dev->dsr->caps.max_sge) {
> + dev_warn(&dev->pdev->dev, "recv queue size invalid\n");
> + return -EINVAL;
> + }
> +
> + qp->rq.wqe_cnt = roundup_pow_of_two(max(1U, req_cap->max_recv_wr));
> + qp->rq.max_sg = roundup_pow_of_two(max(1U, req_cap->max_recv_sge));
> +
> + /* Write back */
> + req_cap->max_recv_wr = qp->rq.wqe_cnt;
> + req_cap->max_recv_sge = qp->rq.max_sg;
> +
> + qp->rq.wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_rq_wqe_hdr) +
> + sizeof(struct pvrdma_sge) *
> + qp->rq.max_sg);
> + qp->npages_recv = (qp->rq.wqe_cnt * qp->rq.wqe_size + PAGE_SIZE - 1) /
> + PAGE_SIZE;
> +
> + return 0;
> +}
> +
> +static int pvrdma_set_sq_size(struct pvrdma_dev *dev, struct ib_qp_cap *req_cap,
> + enum ib_qp_type type, struct pvrdma_qp *qp)
> +{
> + if (req_cap->max_send_wr > dev->dsr->caps.max_qp_wr ||
> + req_cap->max_send_sge > dev->dsr->caps.max_sge) {
> + dev_warn(&dev->pdev->dev, "send queue size invalid\n");
> + return -EINVAL;
> + }
> +
> + qp->sq.wqe_cnt = roundup_pow_of_two(max(1U, req_cap->max_send_wr));
> + qp->sq.max_sg = roundup_pow_of_two(max(1U, req_cap->max_send_sge));
> +
> + /* Write back */
> + req_cap->max_send_wr = qp->sq.wqe_cnt;
> + req_cap->max_send_sge = qp->sq.max_sg;
> +
> + qp->sq.wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_sq_wqe_hdr) +
> + sizeof(struct pvrdma_sge) *
> + qp->sq.max_sg);
> + /* Note: one extra page for the header. */
> + qp->npages_send = 1 + (qp->sq.wqe_cnt * qp->sq.wqe_size +
> + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> + return 0;
> +}
> +
> +/**
> + * pvrdma_create_qp - create queue pair
> + * @pd: protection domain
> + * @init_attr: queue pair attributes
> + * @udata: user data
> + *
> + * @return: the ib_qp pointer on success, otherwise returns an errno.
> + */
> +struct ib_qp *pvrdma_create_qp(struct ib_pd *pd,
> + struct ib_qp_init_attr *init_attr,
> + struct ib_udata *udata)
> +{
> + struct pvrdma_qp *qp = NULL;
> + struct pvrdma_dev *dev = to_vdev(pd->device);
> + union pvrdma_cmd_req req;
> + union pvrdma_cmd_resp rsp;
> + struct pvrdma_cmd_create_qp *cmd = &req.create_qp;
> + struct pvrdma_cmd_create_qp_resp *resp = &rsp.create_qp_resp;
> + struct pvrdma_create_qp ucmd;
> + unsigned long flags;
> + int ret;
> +
> + if (init_attr->create_flags) {
> + dev_warn(&dev->pdev->dev,
> + "invalid create queuepair flags %#x\n",
> + init_attr->create_flags);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (init_attr->qp_type != IB_QPT_RC &&
> + init_attr->qp_type != IB_QPT_UD &&
> + init_attr->qp_type != IB_QPT_GSI) {
> + dev_warn(&dev->pdev->dev, "queuepair type %d not supported\n",
> + init_attr->qp_type);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + if (!atomic_add_unless(&dev->num_qps, 1, dev->dsr->caps.max_qp))
> + return ERR_PTR(-ENOMEM);
> +
> + switch (init_attr->qp_type) {
> + case IB_QPT_GSI:
> + if (init_attr->port_num == 0 ||
> + init_attr->port_num > pd->device->phys_port_cnt ||
> + udata) {
> + dev_warn(&dev->pdev->dev, "invalid queuepair attrs\n");
> + ret = -EINVAL;
> + goto err_qp;
> + }
> + /* fall through */
> + case IB_QPT_RC:
> + case IB_QPT_UD:
> + qp = kzalloc(sizeof(*qp), GFP_KERNEL);
> + if (!qp) {
> + ret = -ENOMEM;
> + goto err_qp;
> + }
> +
> + spin_lock_init(&qp->sq.lock);
> + spin_lock_init(&qp->rq.lock);
> + mutex_init(&qp->mutex);
> + atomic_set(&qp->refcnt, 1);
> + init_waitqueue_head(&qp->wait);
> +
> + qp->state = IB_QPS_RESET;
> +
> + if (pd->uobject && udata) {
> + dev_dbg(&dev->pdev->dev,
> + "create queuepair from user space\n");
> +
> + if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
> + ret = -EFAULT;
> + goto err_qp;
> + }
> +
> + /* set qp->sq.wqe_cnt, shift, buf_size.. */
> + qp->rumem = ib_umem_get(pd->uobject->context,
> + ucmd.rbuf_addr,
> + ucmd.rbuf_size, 0, 0);
> + if (IS_ERR(qp->rumem)) {
> + ret = PTR_ERR(qp->rumem);
> + goto err_qp;
> + }
> +
> + qp->sumem = ib_umem_get(pd->uobject->context,
> + ucmd.sbuf_addr,
> + ucmd.sbuf_size, 0, 0);
> + if (IS_ERR(qp->sumem)) {
> + ib_umem_release(qp->rumem);
> + ret = PTR_ERR(qp->sumem);
> + goto err_qp;
> + }
> +
> + qp->npages_send = ib_umem_page_count(qp->sumem);
> + qp->npages_recv = ib_umem_page_count(qp->rumem);
> + qp->npages = qp->npages_send + qp->npages_recv;
> + } else {
> + qp->is_kernel = true;
> +
> + ret = pvrdma_set_sq_size(to_vdev(pd->device),
> + &init_attr->cap,
> + init_attr->qp_type, qp);
> + if (ret)
> + goto err_qp;
> +
> + ret = pvrdma_set_rq_size(to_vdev(pd->device),
> + &init_attr->cap, qp);
> + if (ret)
> + goto err_qp;
> +
> + qp->npages = qp->npages_send + qp->npages_recv;
> +
> + /* Skip header page. */
> + qp->sq.offset = PAGE_SIZE;
> +
> + /* Recv queue pages are after send pages. */
> + qp->rq.offset = qp->npages_send * PAGE_SIZE;
> + }
> +
> + if (qp->npages < 0 || qp->npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
> + dev_warn(&dev->pdev->dev,
> + "overflow pages in queuepair\n");
> + ret = -EINVAL;
> + goto err_umem;
> + }
> +
> + ret = pvrdma_page_dir_init(dev, &qp->pdir, qp->npages,
> + qp->is_kernel);
> + if (ret) {
> + dev_warn(&dev->pdev->dev,
> + "could not allocate page directory\n");
> + goto err_umem;
> + }
> +
> + if (!qp->is_kernel) {
> + pvrdma_page_dir_insert_umem(&qp->pdir, qp->sumem, 0);
> + pvrdma_page_dir_insert_umem(&qp->pdir, qp->rumem,
> + qp->npages_send);
> + } else {
> + /* Ring state is always the first page. */
> + qp->sq.ring = qp->pdir.pages[0];
> + qp->rq.ring = &qp->sq.ring[1];
> + }
> + break;
> + default:
> + ret = -EINVAL;
> + goto err_qp;
> + }
> +
> + /* Not supported */
> + init_attr->cap.max_inline_data = 0;
> +
> + memset(cmd, 0, sizeof(*cmd));
> + cmd->hdr.cmd = PVRDMA_CMD_CREATE_QP;
> + cmd->pd_handle = to_vpd(pd)->pd_handle;
> + cmd->send_cq_handle = to_vcq(init_attr->send_cq)->cq_handle;
> + cmd->recv_cq_handle = to_vcq(init_attr->recv_cq)->cq_handle;
> + cmd->max_send_wr = init_attr->cap.max_send_wr;
> + cmd->max_recv_wr = init_attr->cap.max_recv_wr;
> + cmd->max_send_sge = init_attr->cap.max_send_sge;
> + cmd->max_recv_sge = init_attr->cap.max_recv_sge;
> + cmd->max_inline_data = init_attr->cap.max_inline_data;
> + cmd->sq_sig_all = (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR) ? 1 : 0;
> + cmd->qp_type = ib_qp_type_to_pvrdma(init_attr->qp_type);
> + cmd->access_flags = IB_ACCESS_LOCAL_WRITE;
> + cmd->total_chunks = qp->npages;
> + cmd->send_chunks = qp->npages_send - 1;
> + cmd->pdir_dma = qp->pdir.dir_dma;
> +
> + dev_dbg(&dev->pdev->dev, "create queuepair with %d, %d, %d, %d\n",
> + cmd->max_send_wr, cmd->max_recv_wr, cmd->max_send_sge,
> + cmd->max_recv_sge);
> +
> + ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_QP_RESP);
> + if (ret < 0) {
> + dev_warn(&dev->pdev->dev,
> + "could not create queuepair, error: %d\n", ret);
> + goto err_pdir;
> + }
> +
> + /* max_send_wr/_recv_wr/_send_sge/_recv_sge/_inline_data */
> + qp->qp_handle = resp->qpn;
> + qp->port = init_attr->port_num;
> + qp->ibqp.qp_num = resp->qpn;
> + spin_lock_irqsave(&dev->qp_tbl_lock, flags);
> + dev->qp_tbl[qp->qp_handle % dev->dsr->caps.max_qp] = qp;
> + spin_unlock_irqrestore(&dev->qp_tbl_lock, flags);
> +
> + return &qp->ibqp;
> +
> +err_pdir:
> + pvrdma_page_dir_cleanup(dev, &qp->pdir);
> +err_umem:
> + if (pd->uobject && udata) {
> + if (qp->rumem)
> + ib_umem_release(qp->rumem);
> + if (qp->sumem)
> + ib_umem_release(qp->sumem);
> + }
> +err_qp:
> + kfree(qp);
> + atomic_dec(&dev->num_qps);
> +
> + return ERR_PTR(ret);
> +}
> +
> +static void pvrdma_free_qp(struct pvrdma_qp *qp)
> +{
> + struct pvrdma_dev *dev = to_vdev(qp->ibqp.device);
> + struct pvrdma_cq *scq;
> + struct pvrdma_cq *rcq;
> + unsigned long flags, scq_flags, rcq_flags;
> +
> + /* In case cq is polling */
> + get_cqs(qp, &scq, &rcq);
> + pvrdma_lock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> + _pvrdma_flush_cqe(qp, scq);
> + if (scq != rcq)
> + _pvrdma_flush_cqe(qp, rcq);
> +
> + spin_lock_irqsave(&dev->qp_tbl_lock, flags);
> + dev->qp_tbl[qp->qp_handle] = NULL;
> + spin_unlock_irqrestore(&dev->qp_tbl_lock, flags);
> +
> + pvrdma_unlock_cqs(scq, rcq, &scq_flags, &rcq_flags);
> +
> + atomic_dec(&qp->refcnt);
> + wait_event(qp->wait, !atomic_read(&qp->refcnt));
> +
> + pvrdma_page_dir_cleanup(dev, &qp->pdir);
> +
> + kfree(qp);
> +
> + atomic_dec(&dev->num_qps);
> +}
> +
> +/**
> + * pvrdma_destroy_qp - destroy a queue pair
> + * @qp: the queue pair to destroy
> + *
> + * @return: 0 on success.
> + */
> +int pvrdma_destroy_qp(struct ib_qp *qp)
> +{
> + struct pvrdma_qp *vqp = to_vqp(qp);
> + union pvrdma_cmd_req req;
> + struct pvrdma_cmd_destroy_qp *cmd = &req.destroy_qp;
> + int ret;
> +
> + memset(cmd, 0, sizeof(*cmd));
> + cmd->hdr.cmd = PVRDMA_CMD_DESTROY_QP;
> + cmd->qp_handle = vqp->qp_handle;
> +
> + ret = pvrdma_cmd_post(to_vdev(qp->device), &req, NULL, 0);
> + if (ret < 0)
> + dev_warn(&to_vdev(qp->device)->pdev->dev,
> + "destroy queuepair failed, error: %d\n", ret);
> +
> + pvrdma_free_qp(vqp);
> +
> + return 0;
> +}
> +
> +/**
> + * pvrdma_modify_qp - modify queue pair attributes
> + * @ibqp: the queue pair
> + * @attr: the new queue pair's attributes
> + * @attr_mask: attributes mask
> + * @udata: user data
> + *
> + * @returns 0 on success, otherwise returns an errno.
> + */
> +int pvrdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
> + int attr_mask, struct ib_udata *udata)
> +{
> + struct pvrdma_dev *dev = to_vdev(ibqp->device);
> + struct pvrdma_qp *qp = to_vqp(ibqp);
> + union pvrdma_cmd_req req;
> + union pvrdma_cmd_resp rsp;
> + struct pvrdma_cmd_modify_qp *cmd = &req.modify_qp;
> + int cur_state, next_state;
> + int ret;
> +
> + /* Sanity checking. Should need lock here */
> + mutex_lock(&qp->mutex);
> + cur_state = (attr_mask & IB_QP_CUR_STATE) ? attr->cur_qp_state :
> + qp->state;
> + next_state = (attr_mask & IB_QP_STATE) ? attr->qp_state : cur_state;
> +
> + if (!ib_modify_qp_is_ok(cur_state, next_state, ibqp->qp_type,
> + attr_mask, IB_LINK_LAYER_ETHERNET)) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (attr_mask & IB_QP_PORT) {
> + if (attr->port_num == 0 ||
> + attr->port_num > ibqp->device->phys_port_cnt) {
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + if (attr_mask & IB_QP_MIN_RNR_TIMER) {
> + if (attr->min_rnr_timer > 31) {
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + if (attr_mask & IB_QP_PKEY_INDEX) {
> + if (attr->pkey_index >= dev->dsr->caps.max_pkeys) {
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + if (attr_mask & IB_QP_QKEY)
> + qp->qkey = attr->qkey;
> +
> + if (cur_state == next_state && cur_state == IB_QPS_RESET) {
> + ret = 0;
> + goto out;
> + }
> +
> + qp->state = next_state;
> + memset(cmd, 0, sizeof(*cmd));
> + cmd->hdr.cmd = PVRDMA_CMD_MODIFY_QP;
> + cmd->qp_handle = qp->qp_handle;
> + cmd->attr_mask = ib_qp_attr_mask_to_pvrdma(attr_mask);
> + cmd->attrs.qp_state = ib_qp_state_to_pvrdma(attr->qp_state);
> + cmd->attrs.cur_qp_state =
> + ib_qp_state_to_pvrdma(attr->cur_qp_state);
> + cmd->attrs.path_mtu = ib_mtu_to_pvrdma(attr->path_mtu);
> + cmd->attrs.path_mig_state =
> + ib_mig_state_to_pvrdma(attr->path_mig_state);
> + cmd->attrs.qkey = attr->qkey;
> + cmd->attrs.rq_psn = attr->rq_psn;
> + cmd->attrs.sq_psn = attr->sq_psn;
> + cmd->attrs.dest_qp_num = attr->dest_qp_num;
> + cmd->attrs.qp_access_flags =
> + ib_access_flags_to_pvrdma(attr->qp_access_flags);
> + cmd->attrs.pkey_index = attr->pkey_index;
> + cmd->attrs.alt_pkey_index = attr->alt_pkey_index;
> + cmd->attrs.en_sqd_async_notify = attr->en_sqd_async_notify;
> + cmd->attrs.sq_draining = attr->sq_draining;
> + cmd->attrs.max_rd_atomic = attr->max_rd_atomic;
> + cmd->attrs.max_dest_rd_atomic = attr->max_dest_rd_atomic;
> + cmd->attrs.min_rnr_timer = attr->min_rnr_timer;
> + cmd->attrs.port_num = attr->port_num;
> + cmd->attrs.timeout = attr->timeout;
> + cmd->attrs.retry_cnt = attr->retry_cnt;
> + cmd->attrs.rnr_retry = attr->rnr_retry;
> + cmd->attrs.alt_port_num = attr->alt_port_num;
> + cmd->attrs.alt_timeout = attr->alt_timeout;
> + ib_qp_cap_to_pvrdma(&cmd->attrs.cap, &attr->cap);
> + ib_ah_attr_to_pvrdma(&cmd->attrs.ah_attr, &attr->ah_attr);
> + ib_ah_attr_to_pvrdma(&cmd->attrs.alt_ah_attr, &attr->alt_ah_attr);
> +
> + ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_MODIFY_QP_RESP);
> + if (ret < 0) {
> + dev_warn(&dev->pdev->dev,
> + "could not modify queuepair, error: %d\n", ret);
> + } else if (rsp.hdr.err > 0) {
> + dev_warn(&dev->pdev->dev,
> + "cannot modify queuepair, error: %d\n", rsp.hdr.err);
> + ret = -EINVAL;
> + }
> +
> + if (ret == 0 && next_state == IB_QPS_RESET)
> + pvrdma_reset_qp(qp);
> +
> +out:
> + mutex_unlock(&qp->mutex);
> +
> + return ret;
> +}
> +
> +static inline void *get_sq_wqe(struct pvrdma_qp *qp, int n)
> +{
> + return pvrdma_page_dir_get_ptr(&qp->pdir,
> + qp->sq.offset + n * qp->sq.wqe_size);
> +}
> +
> +static inline void *get_rq_wqe(struct pvrdma_qp *qp, int n)
> +{
> + return pvrdma_page_dir_get_ptr(&qp->pdir,
> + qp->rq.offset + n * qp->rq.wqe_size);
> +}
> +
> +static int set_reg_seg(struct pvrdma_sq_wqe_hdr *wqe_hdr, struct ib_reg_wr *wr)
> +{
> + struct pvrdma_user_mr *mr = to_vmr(wr->mr);
> +
> + wqe_hdr->wr.fast_reg.iova_start = mr->ibmr.iova;
> + wqe_hdr->wr.fast_reg.pl_pdir_dma = mr->pdir.dir_dma;
> + wqe_hdr->wr.fast_reg.page_shift = mr->page_shift;
> + wqe_hdr->wr.fast_reg.page_list_len = mr->npages;
> + wqe_hdr->wr.fast_reg.length = mr->ibmr.length;
> + wqe_hdr->wr.fast_reg.access_flags = wr->access;
> + wqe_hdr->wr.fast_reg.rkey = wr->key;
> +
> + return pvrdma_page_dir_insert_page_list(&mr->pdir, mr->pages,
> + mr->npages);
> +}
> +
> +/**
> + * pvrdma_post_send - post send work request entries on a QP
> + * @ibqp: the QP
> + * @wr: work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> + struct ib_send_wr **bad_wr)
> +{
> + struct pvrdma_qp *qp = to_vqp(ibqp);
> + struct pvrdma_dev *dev = to_vdev(ibqp->device);
> + unsigned long flags;
> + struct pvrdma_sq_wqe_hdr *wqe_hdr;
> + struct pvrdma_sge *sge;
> + int i, index;
> + int nreq;
> + int ret;
> +
> + /*
> + * In states lower than RTS, we can fail immediately. In other states,
> + * just post and let the device figure it out.
> + */
> + if (qp->state < IB_QPS_RTS) {
> + *bad_wr = wr;
> + return -EINVAL;
> + }
> +
> + spin_lock_irqsave(&qp->sq.lock, flags);
> +
> + index = pvrdma_idx(&qp->sq.ring->prod_tail, qp->sq.wqe_cnt);
Not sure if it was discussed so posting "just in case".
I believe it is unlikely that index will go out of range but since
pvrdma_idx might return PVRDMA_INVALID_IDX i suggest to add a check here.
Something like:
if (unlikely(index == PVRDMA_INVALID_IDX)) {
*bad_wr = wr;
return -EFAULT;
}
Same goes with pvrdma_post_recv
> + for (nreq = 0; wr; nreq++, wr = wr->next) {
> + unsigned int tail;
> +
> + if (unlikely(!pvrdma_idx_ring_has_space(
> + qp->sq.ring, qp->sq.wqe_cnt, &tail))) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "send queue is full\n");
> + *bad_wr = wr;
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (unlikely(wr->num_sge > qp->sq.max_sg || wr->num_sge < 0)) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "send SGE overflow\n");
> + *bad_wr = wr;
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (unlikely(wr->opcode < 0)) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "invalid send opcode\n");
> + *bad_wr = wr;
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + /*
> + * Only support UD, RC.
> + * Need to check opcode table for thorough checking.
> + * opcode _UD _UC _RC
> + * _SEND x x x
> + * _SEND_WITH_IMM x x x
> + * _RDMA_WRITE x x
> + * _RDMA_WRITE_WITH_IMM x x
> + * _LOCAL_INV x x
> + * _SEND_WITH_INV x x
> + * _RDMA_READ x
> + * _ATOMIC_CMP_AND_SWP x
> + * _ATOMIC_FETCH_AND_ADD x
> + * _MASK_ATOMIC_CMP_AND_SWP x
> + * _MASK_ATOMIC_FETCH_AND_ADD x
> + * _REG_MR x
> + *
> + */
> + if (qp->ibqp.qp_type != IB_QPT_UD &&
> + qp->ibqp.qp_type != IB_QPT_RC &&
> + wr->opcode != IB_WR_SEND) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "unsupported queuepair type\n");
> + *bad_wr = wr;
> + ret = -EINVAL;
> + goto out;
> + } else if (qp->ibqp.qp_type == IB_QPT_UD ||
> + qp->ibqp.qp_type == IB_QPT_GSI) {
> + if (wr->opcode != IB_WR_SEND &&
> + wr->opcode != IB_WR_SEND_WITH_IMM) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "invalid send opcode\n");
> + *bad_wr = wr;
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + wqe_hdr = (struct pvrdma_sq_wqe_hdr *)get_sq_wqe(qp, index);
> + memset(wqe_hdr, 0, sizeof(*wqe_hdr));
> + wqe_hdr->wr_id = wr->wr_id;
> + wqe_hdr->num_sge = wr->num_sge;
> + wqe_hdr->opcode = ib_wr_opcode_to_pvrdma(wr->opcode);
> + wqe_hdr->send_flags = ib_send_flags_to_pvrdma(wr->send_flags);
> + if (wr->opcode == IB_WR_SEND_WITH_IMM ||
> + wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
> + wqe_hdr->ex.imm_data = wr->ex.imm_data;
> +
> + switch (qp->ibqp.qp_type) {
> + case IB_QPT_GSI:
> + case IB_QPT_UD:
> + if (unlikely(!ud_wr(wr)->ah)) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "invalid address handle\n");
> + *bad_wr = wr;
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + /*
> + * Use qkey from qp context if high order bit set,
> + * otherwise from work request.
> + */
> + wqe_hdr->wr.ud.remote_qpn = ud_wr(wr)->remote_qpn;
> + wqe_hdr->wr.ud.remote_qkey =
> + ud_wr(wr)->remote_qkey & 0x80000000 ?
> + qp->qkey : ud_wr(wr)->remote_qkey;
> + wqe_hdr->wr.ud.av = to_vah(ud_wr(wr)->ah)->av;
> +
> + break;
> + case IB_QPT_RC:
> + switch (wr->opcode) {
> + case IB_WR_RDMA_READ:
> + case IB_WR_RDMA_WRITE:
> + case IB_WR_RDMA_WRITE_WITH_IMM:
> + wqe_hdr->wr.rdma.remote_addr =
> + rdma_wr(wr)->remote_addr;
> + wqe_hdr->wr.rdma.rkey = rdma_wr(wr)->rkey;
> + break;
> + case IB_WR_LOCAL_INV:
> + case IB_WR_SEND_WITH_INV:
> + wqe_hdr->ex.invalidate_rkey =
> + wr->ex.invalidate_rkey;
> + break;
> + case IB_WR_ATOMIC_CMP_AND_SWP:
> + case IB_WR_ATOMIC_FETCH_AND_ADD:
> + wqe_hdr->wr.atomic.remote_addr =
> + atomic_wr(wr)->remote_addr;
> + wqe_hdr->wr.atomic.rkey = atomic_wr(wr)->rkey;
> + wqe_hdr->wr.atomic.compare_add =
> + atomic_wr(wr)->compare_add;
> + if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP)
> + wqe_hdr->wr.atomic.swap =
> + atomic_wr(wr)->swap;
> + break;
> + case IB_WR_REG_MR:
> + ret = set_reg_seg(wqe_hdr, reg_wr(wr));
> + if (ret < 0) {
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "Failed to set fast register work request\n");
> + *bad_wr = wr;
> + goto out;
> + }
> + break;
> + default:
> + break;
> + }
> +
> + break;
> + default:
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "invalid queuepair type\n");
> + ret = -EINVAL;
> + *bad_wr = wr;
> + goto out;
> + }
> +
> + sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> + for (i = 0; i < wr->num_sge; i++) {
> + /* Need to check wqe_size 0 or max size */
> + sge->addr = wr->sg_list[i].addr;
> + sge->length = wr->sg_list[i].length;
> + sge->lkey = wr->sg_list[i].lkey;
> + sge++;
> + }
> +
> + /* Make sure wqe is written before index update */
> + smp_wmb();
> +
> + index++;
> + if (unlikely(index >= qp->sq.wqe_cnt))
> + index = 0;
> + /* Update shared sq ring */
> + pvrdma_idx_ring_inc(&qp->sq.ring->prod_tail,
> + qp->sq.wqe_cnt);
> + }
> +
> + ret = 0;
> +
> +out:
> + spin_unlock_irqrestore(&qp->sq.lock, flags);
> +
> + if (!ret)
> + pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_SEND | qp->qp_handle);
> +
> + return ret;
> +}
> +
> +/**
> + * pvrdma_post_receive - post receive work request entries on a QP
> + * @ibqp: the QP
> + * @wr: the work request list to post
> + * @bad_wr: the first bad WR returned
> + *
> + * @return: 0 on success, otherwise errno returned.
> + */
> +int pvrdma_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
> + struct ib_recv_wr **bad_wr)
> +{
> + struct pvrdma_dev *dev = to_vdev(ibqp->device);
> + unsigned long flags;
> + struct pvrdma_qp *qp = to_vqp(ibqp);
> + struct pvrdma_rq_wqe_hdr *wqe_hdr;
> + struct pvrdma_sge *sge;
> + int index, nreq;
> + int ret = 0;
> + int i;
> +
> + /*
> + * In the RESET state, we can fail immediately. For other states,
> + * just post and let the device figure it out.
> + */
> + if (qp->state == IB_QPS_RESET) {
> + *bad_wr = wr;
> + return -EINVAL;
> + }
> +
> + spin_lock_irqsave(&qp->rq.lock, flags);
> +
> + index = pvrdma_idx(&qp->rq.ring->prod_tail, qp->rq.wqe_cnt);
> + for (nreq = 0; wr; nreq++, wr = wr->next) {
> + unsigned int tail;
> +
> + if (unlikely(wr->num_sge > qp->rq.max_sg ||
> + wr->num_sge < 0)) {
> + ret = -EINVAL;
> + *bad_wr = wr;
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "recv SGE overflow\n");
> + goto out;
> + }
> +
> + if (unlikely(!pvrdma_idx_ring_has_space(
> + qp->rq.ring, qp->rq.wqe_cnt, &tail))) {
> + ret = -ENOMEM;
> + *bad_wr = wr;
> + dev_warn_ratelimited(&dev->pdev->dev,
> + "recv queue full\n");
> + goto out;
> + }
> +
> + wqe_hdr = (struct pvrdma_rq_wqe_hdr *)get_rq_wqe(qp, index);
> + wqe_hdr->wr_id = wr->wr_id;
> + wqe_hdr->num_sge = wr->num_sge;
> + wqe_hdr->total_len = 0;
> +
> + sge = (struct pvrdma_sge *)(wqe_hdr + 1);
> + for (i = 0; i < wr->num_sge; i++) {
> + sge->addr = wr->sg_list[i].addr;
> + sge->length = wr->sg_list[i].length;
> + sge->lkey = wr->sg_list[i].lkey;
> + sge++;
> + }
> +
> + /* Make sure wqe is written before index update */
> + smp_wmb();
> +
> + index++;
> + if (unlikely(index >= qp->rq.wqe_cnt))
> + index = 0;
> + /* Update shared rq ring */
> + pvrdma_idx_ring_inc(&qp->rq.ring->prod_tail,
> + qp->rq.wqe_cnt);
> + }
> +
> + spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> + pvrdma_write_uar_qp(dev, PVRDMA_UAR_QP_RECV | qp->qp_handle);
> +
> + return ret;
> +
> +out:
> + spin_unlock_irqrestore(&qp->rq.lock, flags);
> +
> + return ret;
> +}
> +
> +/**
> + * pvrdma_query_qp - query a queue pair's attributes
> + * @ibqp: the queue pair to query
> + * @attr: the queue pair's attributes
> + * @attr_mask: attributes mask
> + * @init_attr: initial queue pair attributes
> + *
> + * @returns 0 on success, otherwise returns an errno.
> + */
> +int pvrdma_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
> + int attr_mask, struct ib_qp_init_attr *init_attr)
> +{
> + struct pvrdma_dev *dev = to_vdev(ibqp->device);
> + struct pvrdma_qp *qp = to_vqp(ibqp);
> + union pvrdma_cmd_req req;
> + union pvrdma_cmd_resp rsp;
> + struct pvrdma_cmd_query_qp *cmd = &req.query_qp;
> + struct pvrdma_cmd_query_qp_resp *resp = &rsp.query_qp_resp;
> + int ret = 0;
> +
> + mutex_lock(&qp->mutex);
> +
> + if (qp->state == IB_QPS_RESET) {
> + attr->qp_state = IB_QPS_RESET;
> + goto out;
> + }
> +
> + memset(cmd, 0, sizeof(*cmd));
> + cmd->hdr.cmd = PVRDMA_CMD_QUERY_QP;
> + cmd->qp_handle = qp->qp_handle;
> + cmd->attr_mask = ib_qp_attr_mask_to_pvrdma(attr_mask);
> +
> + ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_QUERY_QP_RESP);
> + if (ret < 0) {
> + dev_warn(&dev->pdev->dev,
> + "could not query queuepair, error: %d\n", ret);
> + goto out;
> + }
> +
> + attr->qp_state = pvrdma_qp_state_to_ib(resp->attrs.qp_state);
> + attr->cur_qp_state =
> + pvrdma_qp_state_to_ib(resp->attrs.cur_qp_state);
> + attr->path_mtu = pvrdma_mtu_to_ib(resp->attrs.path_mtu);
> + attr->path_mig_state =
> + pvrdma_mig_state_to_ib(resp->attrs.path_mig_state);
> + attr->qkey = resp->attrs.qkey;
> + attr->rq_psn = resp->attrs.rq_psn;
> + attr->sq_psn = resp->attrs.sq_psn;
> + attr->dest_qp_num = resp->attrs.dest_qp_num;
> + attr->qp_access_flags =
> + pvrdma_access_flags_to_ib(resp->attrs.qp_access_flags);
> + attr->pkey_index = resp->attrs.pkey_index;
> + attr->alt_pkey_index = resp->attrs.alt_pkey_index;
> + attr->en_sqd_async_notify = resp->attrs.en_sqd_async_notify;
> + attr->sq_draining = resp->attrs.sq_draining;
> + attr->max_rd_atomic = resp->attrs.max_rd_atomic;
> + attr->max_dest_rd_atomic = resp->attrs.max_dest_rd_atomic;
> + attr->min_rnr_timer = resp->attrs.min_rnr_timer;
> + attr->port_num = resp->attrs.port_num;
> + attr->timeout = resp->attrs.timeout;
> + attr->retry_cnt = resp->attrs.retry_cnt;
> + attr->rnr_retry = resp->attrs.rnr_retry;
> + attr->alt_port_num = resp->attrs.alt_port_num;
> + attr->alt_timeout = resp->attrs.alt_timeout;
> + pvrdma_qp_cap_to_ib(&attr->cap, &resp->attrs.cap);
> + pvrdma_ah_attr_to_ib(&attr->ah_attr, &resp->attrs.ah_attr);
> + pvrdma_ah_attr_to_ib(&attr->alt_ah_attr, &resp->attrs.alt_ah_attr);
> +
> + qp->state = attr->qp_state;
> +
> + ret = 0;
> +
> +out:
> + attr->cur_qp_state = attr->qp_state;
> +
> + init_attr->event_handler = qp->ibqp.event_handler;
> + init_attr->qp_context = qp->ibqp.qp_context;
> + init_attr->send_cq = qp->ibqp.send_cq;
> + init_attr->recv_cq = qp->ibqp.recv_cq;
> + init_attr->srq = qp->ibqp.srq;
> + init_attr->xrcd = NULL;
> + init_attr->cap = attr->cap;
> + init_attr->sq_sig_type = 0;
> + init_attr->qp_type = qp->ibqp.qp_type;
> + init_attr->create_flags = 0;
> + init_attr->port_num = qp->port;
> +
> + mutex_unlock(&qp->mutex);
> + return ret;
> +}
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-14 11:07 UTC (permalink / raw)
To: Leon Romanovsky,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi Leon, All,
While inspecting MLX code as well as other vendors' I see that the actual number of cq->cqe is configured to be less by 1 than 'entries'. Why is that?
e.g.
struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
const struct ib_cq_init_attr *attr,
struct ib_ucontext *context,
struct ib_udata *udata)
{
...
cq->ibcq.cqe = entries - 1;
...
}
static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
int entries)
{
...
cq->resize_buf->cqe = entries - 1; // this is later copied to cq->ibcq.cqe
...
}
Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR
From: Leon Romanovsky @ 2016-11-14 6:00 UTC (permalink / raw)
To: Julia Lawall
Cc: Christian Benvenuti, kernel-janitors, Dave Goodell, Doug Ledford,
Sean Hefty, Hal Rosenstock, linux-rdma, linux-kernel,
Christophe JAILLET
In-Reply-To: <1478891066-16093-1-git-send-email-Julia.Lawall@lip6.fr>
[-- Attachment #1: Type: text/plain, Size: 889 bytes --]
On Fri, Nov 11, 2016 at 08:04:26PM +0100, Julia Lawall wrote:
> The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a
> valid pointer, never NULL. The same is true of get_qp_res_chunk, which
> just returns the result of calling usnic_ib_qp_grp_get_chunk. Simplify
> IS_ERR_OR_NULL to IS_ERR in both cases.
>
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
>
> // <smpl>
> @@
> expression t,e;
> @@
>
> t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
> ... when != t=e
> - IS_ERR_OR_NULL(t)
> + IS_ERR(t)
>
> @@
> expression t,e,e1;
> @@
>
> t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
> ... when != t=e
> ?- t ? PTR_ERR(t) : e1
> + PTR_ERR(t)
> ... when any
> // </smpl>
>
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Thanks, Julia.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v5 rdma-core 0/7] libhns: userspace library for hns
From: Leon Romanovsky @ 2016-11-14 5:53 UTC (permalink / raw)
To: Lijun Ou
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]
On Sun, Nov 13, 2016 at 06:35:53PM +0800, Lijun Ou wrote:
> This patch series introduces userspace library for hns RoCE driver.
>
> changes v4 -> v5:
> 1. elminate the warning when CFLAGS equal to -m32
>
> changes v3 -> v4:
> 1. eliminate the warning by Travis CI testing
>
> changes v2 -> v3:
> 1. Fix the code style, for example, if (addr == NULL)
> 2. Fix the bug for hns_roce_u_reg_mr
>
> changes v1 -> v2:
> 1. Delete the min() definition and instead of ccan header
> 2. Delete the CHECK_C_SOURCE_COMPILES
> 3. sort the c file in rdma_provider()
> 4. Delete the unused code in hns_roce_u_db.h
>
> Lijun Ou (7):
> libhns: Add initial main frame
> libhns: Add verbs of querying device and querying port
> libhns: Add verbs of pd and mr support
> libhns: Add verbs of cq support
> libhns: Add verbs of qp support
> libhns: Add verbs of post_send and post_recv support
> libhns: Add consolidated repo for userspace library of hns
Thanks, applied.
https://github.com/linux-rdma/rdma-core/pull/38
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH, RESEND] IB/srpt: Report login failures only once
From: Max Gurtovoy @ 2016-11-13 17:29 UTC (permalink / raw)
To: Bart Van Assche, Doug Ledford
Cc: Nicholas A. Bellinger, Christoph Hellwig, Sagi Grimberg,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <7737d8fc-d41d-0755-d7d2-a3a2b9b6a76e-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
On 11/12/2016 2:36 AM, Bart Van Assche wrote:
> Report the following message only once if no ACL has been configured
> yet for an initiator port:
>
> "Rejected login because no ACL has been configured yet for initiator %s.\n"
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>
Looks good.
Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-13 16:42 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Elior, Ariel, Kalderon, Michal, Mintz, Yuval,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477400039-16925-1-git-send-email-Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> The recent introduction of qedr driver support in qede causes a GPF when
> probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids
> using an uninitialized pointer in such a case. Caught by the kernel test robot.
>
> Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> ---
> drivers/net/ethernet/qlogic/qede/qede_roce.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_roce.c
> b/drivers/net/ethernet/qlogic/qede/qede_roce.c
> index 9867f96..4927271 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_roce.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_roce.c
> @@ -191,8 +191,8 @@ int qede_roce_register_driver(struct qedr_driver *drv)
> }
> mutex_unlock(&qedr_dev_list_lock);
>
> - DP_INFO(edev, "qedr: discovered and registered %d RoCE funcs\n",
> - qedr_counter);
> + pr_notice("qedr: discovered and registered %d RoCE funcs\n",
> + qedr_counter);
>
> return 0;
> }
> --
> 1.8.3.1
Hi Doug,
Can you update if this patch taken and if not, when do you expect it will be?
Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH v5 rdma-core 7/7] libhns: Add consolidated repo for userspace library of hns
From: Lijun Ou @ 2016-11-13 10:36 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch configures the consolidated repo to build userspace
library of hns(libhns).
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3:
- No change over v2
v2:
- Delete the CHECK_C_SOURCE_COMPILES and sort the .c file
v1:
- The initial submit
---
CMakeLists.txt | 1 +
MAINTAINERS | 6 ++++++
README.md | 1 +
providers/hns/CMakeLists.txt | 6 ++++++
4 files changed, 14 insertions(+)
create mode 100644 providers/hns/CMakeLists.txt
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0ac7477..3c2aa79 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -328,6 +328,7 @@ add_subdirectory(libibcm)
add_subdirectory(providers/cxgb3)
add_subdirectory(providers/cxgb4)
add_subdirectory(providers/hfi1verbs)
+add_subdirectory(providers/hns)
add_subdirectory(providers/i40iw)
add_subdirectory(providers/ipathverbs)
add_subdirectory(providers/mlx4)
diff --git a/MAINTAINERS b/MAINTAINERS
index d83de10..bc6eb50 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,6 +57,12 @@ S: Supported
L: intel-opa-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org (moderated for non-subscribers)
F: providers/hfi1verbs/
+HNS USERSPACE PROVIDER (for hns-roce.ko)
+M: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+M: Wei Hu(Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
+S: Supported
+F: providers/hns/
+
I40IW USERSPACE PROVIDER (for i40iw.ko)
M: Tatyana Nikolova <Tatyana.E.Nikolova-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
S: Supported
diff --git a/README.md b/README.md
index 3a13042..e3bc33f 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@ is included:
- iw_cxgb3.ko
- iw_cxgb4.ko
- hfi1.ko
+ - hns-roce.ko
- i40iw.ko
- ib_qib.ko
- mlx4_ib.ko
diff --git a/providers/hns/CMakeLists.txt b/providers/hns/CMakeLists.txt
new file mode 100644
index 0000000..19a793e
--- /dev/null
+++ b/providers/hns/CMakeLists.txt
@@ -0,0 +1,6 @@
+rdma_provider(hns
+ hns_roce_u.c
+ hns_roce_u_buf.c
+ hns_roce_u_hw_v1.c
+ hns_roce_u_verbs.c
+)
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v5 rdma-core 6/7] libhns: Add verbs of post_send and post_recv support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the verbs of posting send
and psoting recv.
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3/v2:
- No change over the v1
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 2 +
providers/hns/hns_roce_u.h | 8 +
providers/hns/hns_roce_u_hw_v1.c | 314 +++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u_hw_v1.h | 79 ++++++++++
4 files changed, 403 insertions(+)
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index de2fd57..281f9f4 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -131,6 +131,8 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.query_qp = hns_roce_u_query_qp;
context->ibv_ctx.ops.modify_qp = hr_dev->u_hw->modify_qp;
context->ibv_ctx.ops.destroy_qp = hr_dev->u_hw->destroy_qp;
+ context->ibv_ctx.ops.post_send = hr_dev->u_hw->post_send;
+ context->ibv_ctx.ops.post_recv = hr_dev->u_hw->post_recv;
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 02b9251..4a6ed8e 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -51,6 +51,10 @@
#define PFX "hns: "
+#ifndef likely
+#define likely(x) __builtin_expect(!!(x), 1)
+#endif
+
#define roce_get_field(origin, mask, shift) \
(((origin) & (mask)) >> (shift))
@@ -171,6 +175,10 @@ struct hns_roce_qp {
struct hns_roce_u_hw {
int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+ int (*post_send)(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr);
+ int (*post_recv)(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr);
int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
int attr_mask);
int (*destroy_qp)(struct ibv_qp *ibqp);
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index e5c7f6a..e5cfe48 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -37,6 +37,59 @@
#include "hns_roce_u_hw_v1.h"
#include "hns_roce_u.h"
+static inline void set_raddr_seg(struct hns_roce_wqe_raddr_seg *rseg,
+ uint64_t remote_addr, uint32_t rkey)
+{
+ rseg->raddr = remote_addr;
+ rseg->rkey = rkey;
+ rseg->len = 0;
+}
+
+static void set_data_seg(struct hns_roce_wqe_data_seg *dseg, struct ibv_sge *sg)
+{
+
+ dseg->lkey = sg->lkey;
+ dseg->addr = sg->addr;
+ dseg->len = sg->length;
+}
+
+static void hns_roce_update_rq_head(struct hns_roce_context *ctx,
+ unsigned int qpn, unsigned int rq_head)
+{
+ struct hns_roce_rq_db rq_db;
+
+ rq_db.u32_4 = 0;
+ rq_db.u32_8 = 0;
+
+ roce_set_field(rq_db.u32_4, RQ_DB_U32_4_RQ_HEAD_M,
+ RQ_DB_U32_4_RQ_HEAD_S, rq_head);
+ roce_set_field(rq_db.u32_8, RQ_DB_U32_8_QPN_M, RQ_DB_U32_8_QPN_S, qpn);
+ roce_set_field(rq_db.u32_8, RQ_DB_U32_8_CMD_M, RQ_DB_U32_8_CMD_S, 1);
+ roce_set_bit(rq_db.u32_8, RQ_DB_U32_8_HW_SYNC_S, 1);
+
+ hns_roce_write64((uint32_t *)&rq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_update_sq_head(struct hns_roce_context *ctx,
+ unsigned int qpn, unsigned int port,
+ unsigned int sl, unsigned int sq_head)
+{
+ struct hns_roce_sq_db sq_db;
+
+ sq_db.u32_4 = 0;
+ sq_db.u32_8 = 0;
+
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SQ_HEAD_M,
+ SQ_DB_U32_4_SQ_HEAD_S, sq_head);
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_PORT_M, SQ_DB_U32_4_PORT_S,
+ port);
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SL_M, SQ_DB_U32_4_SL_S, sl);
+ roce_set_field(sq_db.u32_8, SQ_DB_U32_8_QPN_M, SQ_DB_U32_8_QPN_S, qpn);
+ roce_set_bit(sq_db.u32_8, SQ_DB_U32_8_HW_SYNC, 1);
+
+ hns_roce_write64((uint32_t *)&sq_db, ctx, ROCEE_DB_SQ_L_0_REG);
+}
+
static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
struct hns_roce_cq *cq)
{
@@ -126,6 +179,16 @@ static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
return get_sw_cqe(cq, cq->cons_index);
}
+static void *get_recv_wqe(struct hns_roce_qp *qp, int n)
+{
+ if ((n < 0) || (n > qp->rq.wqe_cnt)) {
+ printf("rq wqe index:%d,rq wqe cnt:%d\r\n", n, qp->rq.wqe_cnt);
+ return NULL;
+ }
+
+ return qp->buf.buf + qp->rq.offset + (n << qp->rq.wqe_shift);
+}
+
static void *get_send_wqe(struct hns_roce_qp *qp, int n)
{
if ((n < 0) || (n > qp->sq.wqe_cnt)) {
@@ -136,6 +199,26 @@ static void *get_send_wqe(struct hns_roce_qp *qp, int n)
return (void *)(qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift));
}
+static int hns_roce_wq_overflow(struct hns_roce_wq *wq, int nreq,
+ struct hns_roce_cq *cq)
+{
+ unsigned int cur;
+
+ cur = wq->head - wq->tail;
+ if (cur + nreq < wq->max_post)
+ return 0;
+
+ /* While the num of wqe exceeds cap of the device, cq will be locked */
+ pthread_spin_lock(&cq->lock);
+ cur = wq->head - wq->tail;
+ pthread_spin_unlock(&cq->lock);
+
+ printf("wq:(head = %d, tail = %d, max_post = %d), nreq = 0x%x\n",
+ wq->head, wq->tail, wq->max_post, nreq);
+
+ return cur + nreq >= wq->max_post;
+}
+
static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
uint32_t qpn)
{
@@ -372,6 +455,144 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
return 0;
}
+static int hns_roce_u_v1_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr)
+{
+ unsigned int ind;
+ void *wqe;
+ int nreq;
+ int ps_opcode, i;
+ int ret = 0;
+ struct hns_roce_wqe_ctrl_seg *ctrl = NULL;
+ struct hns_roce_wqe_data_seg *dseg = NULL;
+ struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+ pthread_spin_lock(&qp->sq.lock);
+
+ /* check that state is OK to post send */
+ ind = qp->sq.head;
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (hns_roce_wq_overflow(&qp->sq, nreq,
+ to_hr_cq(qp->ibv_qp.send_cq))) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+ if (wr->num_sge > qp->sq.max_gs) {
+ ret = -1;
+ *bad_wr = wr;
+ printf("wr->num_sge(<=%d) = %d, check failed!\r\n",
+ qp->sq.max_gs, wr->num_sge);
+ goto out;
+ }
+
+ ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+ memset(ctrl, 0, sizeof(struct hns_roce_wqe_ctrl_seg));
+
+ qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
+ for (i = 0; i < wr->num_sge; i++)
+ ctrl->msg_length += wr->sg_list[i].length;
+
+
+ ctrl->flag |= ((wr->send_flags & IBV_SEND_SIGNALED) ?
+ HNS_ROCE_WQE_CQ_NOTIFY : 0) |
+ (wr->send_flags & IBV_SEND_SOLICITED ?
+ HNS_ROCE_WQE_SE : 0) |
+ ((wr->opcode == IBV_WR_SEND_WITH_IMM ||
+ wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ?
+ HNS_ROCE_WQE_IMM : 0) |
+ (wr->send_flags & IBV_SEND_FENCE ?
+ HNS_ROCE_WQE_FENCE : 0);
+
+ if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
+ wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
+ ctrl->imm_data = wr->imm_data;
+
+ wqe += sizeof(struct hns_roce_wqe_ctrl_seg);
+
+ /* set remote addr segment */
+ switch (ibvqp->qp_type) {
+ case IBV_QPT_RC:
+ switch (wr->opcode) {
+ case IBV_WR_RDMA_READ:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_READ;
+ set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+ wr->wr.rdma.rkey);
+ break;
+ case IBV_WR_RDMA_WRITE:
+ case IBV_WR_RDMA_WRITE_WITH_IMM:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_WRITE;
+ set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+ wr->wr.rdma.rkey);
+ break;
+ case IBV_WR_SEND:
+ case IBV_WR_SEND_WITH_IMM:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_SEND;
+ break;
+ case IBV_WR_ATOMIC_CMP_AND_SWP:
+ case IBV_WR_ATOMIC_FETCH_AND_ADD:
+ default:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_MASK;
+ break;
+ }
+ ctrl->flag |= (ps_opcode);
+ wqe += sizeof(struct hns_roce_wqe_raddr_seg);
+ break;
+ case IBV_QPT_UC:
+ case IBV_QPT_UD:
+ default:
+ break;
+ }
+
+ dseg = wqe;
+
+ /* Inline */
+ if (wr->send_flags & IBV_SEND_INLINE && wr->num_sge) {
+ if (ctrl->msg_length > qp->max_inline_data) {
+ ret = -1;
+ *bad_wr = wr;
+ printf("inline data len(1-32)=%d, send_flags = 0x%x, check failed!\r\n",
+ wr->send_flags, ctrl->msg_length);
+ return ret;
+ }
+
+ for (i = 0; i < wr->num_sge; i++) {
+ memcpy(wqe,
+ ((void *) (uintptr_t) wr->sg_list[i].addr),
+ wr->sg_list[i].length);
+ wqe = wqe + wr->sg_list[i].length;
+ }
+
+ ctrl->flag |= HNS_ROCE_WQE_INLINE;
+ } else {
+ /* set sge */
+ for (i = 0; i < wr->num_sge; i++)
+ set_data_seg(dseg+i, wr->sg_list + i);
+
+ ctrl->flag |= wr->num_sge << HNS_ROCE_WQE_SGE_NUM_BIT;
+ }
+
+ ind++;
+ }
+
+out:
+ /* Set DB return */
+ if (likely(nreq)) {
+ qp->sq.head += nreq;
+ wmb();
+
+ hns_roce_update_sq_head(ctx, qp->ibv_qp.qp_num,
+ qp->port_num - 1, qp->sl,
+ qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
+ }
+
+ pthread_spin_unlock(&qp->sq.lock);
+
+ return ret;
+}
+
static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
struct hns_roce_srq *srq)
{
@@ -515,9 +736,102 @@ static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
return ret;
}
+static int hns_roce_u_v1_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr)
+{
+ int ret = 0;
+ int nreq;
+ int ind;
+ struct ibv_sge *sg;
+ struct hns_roce_rc_rq_wqe *rq_wqe;
+ struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+ pthread_spin_lock(&qp->rq.lock);
+
+ /* check that state is OK to post receive */
+ ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (hns_roce_wq_overflow(&qp->rq, nreq,
+ to_hr_cq(qp->ibv_qp.recv_cq))) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge > qp->rq.max_gs) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ rq_wqe = get_recv_wqe(qp, ind);
+ if (wr->num_sge > HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM);
+ sg = wr->sg_list;
+
+ rq_wqe->va0 = (sg->addr);
+ rq_wqe->l_key0 = (sg->lkey);
+ rq_wqe->length0 = (sg->length);
+
+ sg = wr->sg_list + 1;
+
+ rq_wqe->va1 = (sg->addr);
+ rq_wqe->l_key1 = (sg->lkey);
+ rq_wqe->length1 = (sg->length);
+ } else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1);
+ sg = wr->sg_list;
+
+ rq_wqe->va0 = (sg->addr);
+ rq_wqe->l_key0 = (sg->lkey);
+ rq_wqe->length0 = (sg->length);
+
+ } else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2);
+ }
+
+ qp->rq.wrid[ind] = wr->wr_id;
+
+ ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+ }
+
+out:
+ if (nreq) {
+ qp->rq.head += nreq;
+
+ wmb();
+
+ hns_roce_update_rq_head(ctx, qp->ibv_qp.qp_num,
+ qp->rq.head & ((qp->rq.wqe_cnt << 1) - 1));
+ }
+
+ pthread_spin_unlock(&qp->rq.lock);
+
+ return ret;
+}
+
struct hns_roce_u_hw hns_roce_u_hw_v1 = {
.poll_cq = hns_roce_u_v1_poll_cq,
.arm_cq = hns_roce_u_v1_arm_cq,
+ .post_send = hns_roce_u_v1_post_send,
+ .post_recv = hns_roce_u_v1_post_recv,
.modify_qp = hns_roce_u_v1_modify_qp,
.destroy_qp = hns_roce_u_v1_destroy_qp,
};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
index b249f54..128c66f 100644
--- a/providers/hns/hns_roce_u_hw_v1.h
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -39,9 +39,15 @@
#define HNS_ROCE_CQE_IS_SQ 0
#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN 32
+#define HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM 2
enum {
+ HNS_ROCE_WQE_INLINE = 1 << 31,
+ HNS_ROCE_WQE_SE = 1 << 30,
+ HNS_ROCE_WQE_SGE_NUM_BIT = 24,
HNS_ROCE_WQE_IMM = 1 << 23,
+ HNS_ROCE_WQE_FENCE = 1 << 21,
+ HNS_ROCE_WQE_CQ_NOTIFY = 1 << 20,
HNS_ROCE_WQE_OPCODE_SEND = 0 << 16,
HNS_ROCE_WQE_OPCODE_RDMA_READ = 1 << 16,
HNS_ROCE_WQE_OPCODE_RDMA_WRITE = 2 << 16,
@@ -52,6 +58,20 @@ enum {
struct hns_roce_wqe_ctrl_seg {
__be32 sgl_pa_h;
__be32 flag;
+ __be32 imm_data;
+ __be32 msg_length;
+};
+
+struct hns_roce_wqe_data_seg {
+ __be64 addr;
+ __be32 lkey;
+ __be32 len;
+};
+
+struct hns_roce_wqe_raddr_seg {
+ __be32 rkey;
+ __be32 len;
+ __be64 raddr;
};
enum {
@@ -102,6 +122,43 @@ struct hns_roce_cq_db {
#define CQ_DB_U32_8_HW_SYNC_S 31
+struct hns_roce_rq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+
+#define RQ_DB_U32_4_RQ_HEAD_S 0
+#define RQ_DB_U32_4_RQ_HEAD_M (((1UL << 15) - 1) << RQ_DB_U32_4_RQ_HEAD_S)
+
+#define RQ_DB_U32_8_QPN_S 0
+#define RQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << RQ_DB_U32_8_QPN_S)
+
+#define RQ_DB_U32_8_CMD_S 28
+#define RQ_DB_U32_8_CMD_M (((1UL << 3) - 1) << RQ_DB_U32_8_CMD_S)
+
+#define RQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_sq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+
+#define SQ_DB_U32_4_SQ_HEAD_S 0
+#define SQ_DB_U32_4_SQ_HEAD_M (((1UL << 15) - 1) << SQ_DB_U32_4_SQ_HEAD_S)
+
+#define SQ_DB_U32_4_SL_S 16
+#define SQ_DB_U32_4_SL_M (((1UL << 2) - 1) << SQ_DB_U32_4_SL_S)
+
+#define SQ_DB_U32_4_PORT_S 18
+#define SQ_DB_U32_4_PORT_M (((1UL << 3) - 1) << SQ_DB_U32_4_PORT_S)
+
+#define SQ_DB_U32_4_DIRECT_WQE_S 31
+
+#define SQ_DB_U32_8_QPN_S 0
+#define SQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << SQ_DB_U32_8_QPN_S)
+
+#define SQ_DB_U32_8_HW_SYNC 31
+
struct hns_roce_cqe {
unsigned int cqe_byte_4;
union {
@@ -160,4 +217,26 @@ struct hns_roce_rc_send_wqe {
unsigned int length1;
};
+struct hns_roce_rc_rq_wqe {
+ unsigned int u32_0;
+ unsigned int sgl_ba_31_0;
+ unsigned int u32_2;
+ unsigned int rvd_5;
+ unsigned int rvd_6;
+ unsigned int rvd_7;
+ unsigned int rvd_8;
+ unsigned int rvd_9;
+
+ uint64_t va0;
+ unsigned int l_key0;
+ unsigned int length0;
+
+ uint64_t va1;
+ unsigned int l_key1;
+ unsigned int length1;
+};
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_S 16
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_M \
+ (((1UL << 6) - 1) << RC_RQ_WQE_NUMBER_OF_DATA_SEG_S)
+
#endif /* _HNS_ROCE_U_HW_V1_H */
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v5 rdma-core 5/7] libhns: Add verbs of qp support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the relatived qp verbs for userspace
library of hns, include:
1. create_qp
2. query_qp
3. modify_qp
4. destroy_qp
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5/v4/v3:
- No change over v2
v2:
- Delete the min() and use the ccan header
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 5 +
providers/hns/hns_roce_u.h | 45 +++++++
providers/hns/hns_roce_u_abi.h | 8 ++
providers/hns/hns_roce_u_hw_v1.c | 155 +++++++++++++++++++++++
providers/hns/hns_roce_u_verbs.c | 259 ++++++++++++++++++++++++++++++++++++++-
5 files changed, 471 insertions(+), 1 deletion(-)
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 1877218..de2fd57 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -127,6 +127,11 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.cq_event = hns_roce_u_cq_event;
context->ibv_ctx.ops.destroy_cq = hns_roce_u_destroy_cq;
+ context->ibv_ctx.ops.create_qp = hns_roce_u_create_qp;
+ context->ibv_ctx.ops.query_qp = hns_roce_u_query_qp;
+ context->ibv_ctx.ops.modify_qp = hr_dev->u_hw->modify_qp;
+ context->ibv_ctx.ops.destroy_qp = hr_dev->u_hw->destroy_qp;
+
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index c3e364d..02b9251 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -44,6 +44,7 @@
#define HNS_ROCE_MAX_CQ_NUM 0x10000
#define HNS_ROCE_MIN_CQE_NUM 0x40
+#define HNS_ROCE_MIN_WQE_NUM 0x20
#define HNS_ROCE_CQ_DB_BUF_SIZE ((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
#define HNS_ROCE_TPTR_OFFSET 0x1000
#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
@@ -128,10 +129,29 @@ struct hns_roce_cq {
int arm_sn;
};
+struct hns_roce_srq {
+ struct ibv_srq ibv_srq;
+ struct hns_roce_buf buf;
+ pthread_spinlock_t lock;
+ unsigned long *wrid;
+ unsigned int srqn;
+ int max;
+ unsigned int max_gs;
+ int wqe_shift;
+ int head;
+ int tail;
+ unsigned int *db;
+ unsigned short counter;
+};
+
struct hns_roce_wq {
unsigned long *wrid;
+ pthread_spinlock_t lock;
unsigned int wqe_cnt;
+ int max_post;
+ unsigned int head;
unsigned int tail;
+ unsigned int max_gs;
int wqe_shift;
int offset;
};
@@ -139,14 +159,21 @@ struct hns_roce_wq {
struct hns_roce_qp {
struct ibv_qp ibv_qp;
struct hns_roce_buf buf;
+ int max_inline_data;
+ int buf_size;
unsigned int sq_signal_bits;
struct hns_roce_wq sq;
struct hns_roce_wq rq;
+ int port_num;
+ int sl;
};
struct hns_roce_u_hw {
int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+ int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask);
+ int (*destroy_qp)(struct ibv_qp *ibqp);
};
static inline unsigned long align(unsigned long val, unsigned long align)
@@ -174,6 +201,16 @@ static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
}
+static inline struct hns_roce_srq *to_hr_srq(struct ibv_srq *ibv_srq)
+{
+ return container_of(ibv_srq, struct hns_roce_srq, ibv_srq);
+}
+
+static inline struct hns_roce_qp *to_hr_qp(struct ibv_qp *ibv_qp)
+{
+ return container_of(ibv_qp, struct hns_roce_qp, ibv_qp);
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr);
int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -193,10 +230,18 @@ struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
int hns_roce_u_destroy_cq(struct ibv_cq *cq);
void hns_roce_u_cq_event(struct ibv_cq *cq);
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr);
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask, struct ibv_qp_init_attr *init_attr);
+
int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
int page_size);
void hns_roce_free_buf(struct hns_roce_buf *buf);
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp);
+
extern struct hns_roce_u_hw hns_roce_u_hw_v1;
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 1e62a7e..e78f967 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -58,4 +58,12 @@ struct hns_roce_create_cq_resp {
__u32 reserved;
};
+struct hns_roce_create_qp {
+ struct ibv_create_qp ibv_cmd;
+ __u64 buf_addr;
+ __u8 log_sq_bb_count;
+ __u8 log_sq_stride;
+ __u8 reserved[5];
+};
+
#endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index 39a67b1..e5c7f6a 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -149,6 +149,16 @@ static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
}
}
+static void hns_roce_clear_qp(struct hns_roce_context *ctx, uint32_t qpn)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (!--ctx->qp_table[tind].refcnt)
+ free(ctx->qp_table[tind].table);
+ else
+ ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL;
+}
+
static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
{
@@ -362,7 +372,152 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
return 0;
}
+static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
+ struct hns_roce_srq *srq)
+{
+ int nfreed = 0;
+ uint32_t prod_index;
+ uint8_t owner_bit = 0;
+ struct hns_roce_cqe *cqe, *dest;
+ struct hns_roce_context *ctx = to_hr_ctx(cq->ibv_cq.context);
+
+ for (prod_index = cq->cons_index; get_sw_cqe(cq, prod_index);
+ ++prod_index)
+ if (prod_index == cq->cons_index + cq->ibv_cq.cqe)
+ break;
+
+ while ((int) --prod_index - (int) cq->cons_index >= 0) {
+ cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
+ if ((roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S) & 0xffffff) == qpn) {
+ ++nfreed;
+ } else if (nfreed) {
+ dest = get_cqe(cq,
+ (prod_index + nfreed) & cq->ibv_cq.cqe);
+ owner_bit = roce_get_bit(dest->cqe_byte_4,
+ CQE_BYTE_4_OWNER_S);
+ memcpy(dest, cqe, sizeof(*cqe));
+ roce_set_bit(dest->cqe_byte_4, CQE_BYTE_4_OWNER_S,
+ owner_bit);
+ }
+ }
+
+ if (nfreed) {
+ cq->cons_index += nfreed;
+ wmb();
+ hns_roce_update_cq_cons_index(ctx, cq);
+ }
+}
+
+static void hns_roce_v1_cq_clean(struct hns_roce_cq *cq, unsigned int qpn,
+ struct hns_roce_srq *srq)
+{
+ pthread_spin_lock(&cq->lock);
+ __hns_roce_v1_cq_clean(cq, qpn, srq);
+ pthread_spin_unlock(&cq->lock);
+}
+
+static int hns_roce_u_v1_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask)
+{
+ int ret;
+ struct ibv_modify_qp cmd;
+ struct hns_roce_qp *hr_qp = to_hr_qp(qp);
+
+ ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+ if (!ret && (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RESET) {
+ hns_roce_v1_cq_clean(to_hr_cq(qp->recv_cq), qp->qp_num,
+ qp->srq ? to_hr_srq(qp->srq) : NULL);
+ if (qp->send_cq != qp->recv_cq)
+ hns_roce_v1_cq_clean(to_hr_cq(qp->send_cq), qp->qp_num,
+ NULL);
+
+ hns_roce_init_qp_indices(to_hr_qp(qp));
+ }
+
+ if (!ret && (attr_mask & IBV_QP_PORT)) {
+ hr_qp->port_num = attr->port_num;
+ printf("hr_qp->port_num= 0x%x\n", hr_qp->port_num);
+ }
+
+ hr_qp->sl = attr->ah_attr.sl;
+
+ return ret;
+}
+
+static void hns_roce_lock_cqs(struct ibv_qp *qp)
+{
+ struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+ struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+ if (send_cq == recv_cq) {
+ pthread_spin_lock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_lock(&send_cq->lock);
+ pthread_spin_lock(&recv_cq->lock);
+ } else {
+ pthread_spin_lock(&recv_cq->lock);
+ pthread_spin_lock(&send_cq->lock);
+ }
+}
+
+static void hns_roce_unlock_cqs(struct ibv_qp *qp)
+{
+ struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+ struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+ if (send_cq == recv_cq) {
+ pthread_spin_unlock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_unlock(&recv_cq->lock);
+ pthread_spin_unlock(&send_cq->lock);
+ } else {
+ pthread_spin_unlock(&send_cq->lock);
+ pthread_spin_unlock(&recv_cq->lock);
+ }
+}
+
+static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
+{
+ int ret;
+ struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+ pthread_mutex_lock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+ ret = ibv_cmd_destroy_qp(ibqp);
+ if (ret) {
+ pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+ return ret;
+ }
+
+ hns_roce_lock_cqs(ibqp);
+
+ __hns_roce_v1_cq_clean(to_hr_cq(ibqp->recv_cq), ibqp->qp_num,
+ ibqp->srq ? to_hr_srq(ibqp->srq) : NULL);
+
+ if (ibqp->send_cq != ibqp->recv_cq)
+ __hns_roce_v1_cq_clean(to_hr_cq(ibqp->send_cq), ibqp->qp_num,
+ NULL);
+
+ hns_roce_clear_qp(to_hr_ctx(ibqp->context), ibqp->qp_num);
+
+ hns_roce_unlock_cqs(ibqp);
+ pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+
+ free(qp->sq.wrid);
+ if (qp->rq.wqe_cnt)
+ free(qp->rq.wrid);
+
+ hns_roce_free_buf(&qp->buf);
+ free(qp);
+
+ return ret;
+}
+
struct hns_roce_u_hw hns_roce_u_hw_v1 = {
.poll_cq = hns_roce_u_v1_poll_cq,
.arm_cq = hns_roce_u_v1_arm_cq,
+ .modify_qp = hns_roce_u_v1_modify_qp,
+ .destroy_qp = hns_roce_u_v1_destroy_qp,
};
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index c9324dd..0b8f444 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -38,11 +38,19 @@
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
-
+#include <ccan/minmax.h>
#include "hns_roce_u.h"
#include "hns_roce_u_abi.h"
#include "hns_roce_u_hw_v1.h"
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp)
+{
+ qp->sq.head = 0;
+ qp->sq.tail = 0;
+ qp->rq.head = 0;
+ qp->rq.tail = 0;
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr)
{
@@ -163,6 +171,29 @@ static int align_cq_size(int req)
return nent;
}
+static int align_qp_size(int req)
+{
+ int nent;
+
+ for (nent = HNS_ROCE_MIN_WQE_NUM; nent < req; nent <<= 1)
+ ;
+
+ return nent;
+}
+
+static void hns_roce_set_sq_sizes(struct hns_roce_qp *qp,
+ struct ibv_qp_cap *cap, enum ibv_qp_type type)
+{
+ struct hns_roce_context *ctx = to_hr_ctx(qp->ibv_qp.context);
+
+ qp->sq.max_gs = 2;
+ cap->max_send_sge = min(ctx->max_sge, qp->sq.max_gs);
+ qp->sq.max_post = min(ctx->max_qp_wr, qp->sq.wqe_cnt);
+ cap->max_send_wr = qp->sq.max_post;
+ qp->max_inline_data = 32;
+ cap->max_inline_data = qp->max_inline_data;
+}
+
static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
{
if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
@@ -189,6 +220,17 @@ static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
return 0;
}
+static void hns_roce_calc_sq_wqe_size(struct ibv_qp_cap *cap,
+ enum ibv_qp_type type,
+ struct hns_roce_qp *qp)
+{
+ int size = sizeof(struct hns_roce_rc_send_wqe);
+
+ for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
+ qp->sq.wqe_shift++)
+ ;
+}
+
struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
struct ibv_comp_channel *channel,
int comp_vector)
@@ -266,3 +308,218 @@ int hns_roce_u_destroy_cq(struct ibv_cq *cq)
return ret;
}
+
+static int hns_roce_verify_qp(struct ibv_qp_init_attr *attr,
+ struct hns_roce_context *context)
+{
+ if (attr->cap.max_send_wr < HNS_ROCE_MIN_WQE_NUM) {
+ fprintf(stderr,
+ "max_send_wr = %d, less than minimum WQE number.\n",
+ attr->cap.max_send_wr);
+ attr->cap.max_send_wr = HNS_ROCE_MIN_WQE_NUM;
+ }
+
+ if (attr->cap.max_recv_wr < HNS_ROCE_MIN_WQE_NUM) {
+ fprintf(stderr,
+ "max_recv_wr = %d, less than minimum WQE number.\n",
+ attr->cap.max_recv_wr);
+ attr->cap.max_recv_wr = HNS_ROCE_MIN_WQE_NUM;
+ }
+
+ if (attr->cap.max_recv_sge < 1)
+ attr->cap.max_recv_sge = 1;
+ if (attr->cap.max_send_wr > context->max_qp_wr ||
+ attr->cap.max_recv_wr > context->max_qp_wr ||
+ attr->cap.max_send_sge > context->max_sge ||
+ attr->cap.max_recv_sge > context->max_sge)
+ return -1;
+
+ if ((attr->qp_type != IBV_QPT_RC) && (attr->qp_type != IBV_QPT_UD))
+ return -1;
+
+ if ((attr->qp_type == IBV_QPT_RC) &&
+ (attr->cap.max_inline_data > HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN))
+ return -1;
+
+ if (attr->qp_type == IBV_QPT_UC)
+ return -1;
+
+ return 0;
+}
+
+static int hns_roce_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
+ enum ibv_qp_type type, struct hns_roce_qp *qp)
+{
+ qp->sq.wrid =
+ (unsigned long *)malloc(qp->sq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->sq.wrid)
+ return -1;
+
+ if (qp->rq.wqe_cnt) {
+ qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->rq.wrid) {
+ free(qp->sq.wrid);
+ return -1;
+ }
+ }
+
+ for (qp->rq.wqe_shift = 4;
+ 1 << qp->rq.wqe_shift < sizeof(struct hns_roce_rc_send_wqe);
+ qp->rq.wqe_shift++)
+ ;
+
+ qp->buf_size = align((qp->sq.wqe_cnt << qp->sq.wqe_shift), 0x1000) +
+ (qp->rq.wqe_cnt << qp->rq.wqe_shift);
+
+ if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
+ qp->rq.offset = 0;
+ qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
+ } else {
+ qp->rq.offset = align((qp->sq.wqe_cnt << qp->sq.wqe_shift),
+ 0x1000);
+ qp->sq.offset = 0;
+ }
+
+ if (hns_roce_alloc_buf(&qp->buf, align(qp->buf_size, 0x1000),
+ to_hr_dev(pd->context->device)->page_size)) {
+ free(qp->sq.wrid);
+ free(qp->rq.wrid);
+ return -1;
+ }
+
+ memset(qp->buf.buf, 0, qp->buf_size);
+
+ return 0;
+}
+
+static int hns_roce_store_qp(struct hns_roce_context *ctx, uint32_t qpn,
+ struct hns_roce_qp *qp)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (!ctx->qp_table[tind].refcnt) {
+ ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1,
+ sizeof(struct hns_roce_qp *));
+ if (!ctx->qp_table[tind].table)
+ return -1;
+ }
+
+ ++ctx->qp_table[tind].refcnt;
+ ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp;
+
+ return 0;
+}
+
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr)
+{
+ int ret;
+ struct hns_roce_qp *qp = NULL;
+ struct hns_roce_create_qp cmd;
+ struct ibv_create_qp_resp resp;
+ struct hns_roce_context *context = to_hr_ctx(pd->context);
+
+ if (hns_roce_verify_qp(attr, context)) {
+ fprintf(stderr, "hns_roce_verify_sizes failed!\n");
+ return NULL;
+ }
+
+ qp = malloc(sizeof(*qp));
+ if (!qp) {
+ fprintf(stderr, "malloc failed!\n");
+ return NULL;
+ }
+
+ hns_roce_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
+ qp->sq.wqe_cnt = align_qp_size(attr->cap.max_send_wr);
+ qp->rq.wqe_cnt = align_qp_size(attr->cap.max_recv_wr);
+
+ if (hns_roce_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) {
+ fprintf(stderr, "hns_roce_alloc_qp_buf failed!\n");
+ goto err;
+ }
+
+ hns_roce_init_qp_indices(qp);
+
+ if (pthread_spin_init(&qp->sq.lock, PTHREAD_PROCESS_PRIVATE) ||
+ pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) {
+ fprintf(stderr, "pthread_spin_init failed!\n");
+ goto err_free;
+ }
+
+ cmd.buf_addr = (uintptr_t) qp->buf.buf;
+ cmd.log_sq_stride = qp->sq.wqe_shift;
+ for (cmd.log_sq_bb_count = 0; qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count;
+ ++cmd.log_sq_bb_count)
+ ;
+
+ memset(cmd.reserved, 0, sizeof(cmd.reserved));
+
+ pthread_mutex_lock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+ ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd,
+ sizeof(cmd), &resp, sizeof(resp));
+ if (ret) {
+ fprintf(stderr, "ibv_cmd_create_qp failed!\n");
+ goto err_rq_db;
+ }
+
+ ret = hns_roce_store_qp(to_hr_ctx(pd->context), qp->ibv_qp.qp_num, qp);
+ if (ret) {
+ fprintf(stderr, "hns_roce_store_qp failed!\n");
+ goto err_destroy;
+ }
+ pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+ qp->rq.wqe_cnt = attr->cap.max_recv_wr;
+ qp->rq.max_gs = attr->cap.max_recv_sge;
+
+ /* adjust rq maxima to not exceed reported device maxima */
+ attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
+ attr->cap.max_recv_sge = min(context->max_sge, attr->cap.max_recv_sge);
+
+ qp->rq.max_post = attr->cap.max_recv_wr;
+ hns_roce_set_sq_sizes(qp, &attr->cap, attr->qp_type);
+
+ qp->sq_signal_bits = attr->sq_sig_all ? 0 : 1;
+
+ return &qp->ibv_qp;
+
+err_destroy:
+ ibv_cmd_destroy_qp(&qp->ibv_qp);
+
+err_rq_db:
+ pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+err_free:
+ free(qp->sq.wrid);
+ if (qp->rq.wqe_cnt)
+ free(qp->rq.wrid);
+ hns_roce_free_buf(&qp->buf);
+
+err:
+ free(qp);
+
+ return NULL;
+}
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+ int ret;
+ struct ibv_query_qp cmd;
+ struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+ ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd,
+ sizeof(cmd));
+ if (ret)
+ return ret;
+
+ init_attr->cap.max_send_wr = qp->sq.max_post;
+ init_attr->cap.max_send_sge = qp->sq.max_gs;
+ init_attr->cap.max_inline_data = qp->max_inline_data;
+
+ attr->cap = init_attr->cap;
+
+ return ret;
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v5 rdma-core 4/7] libhns: Add verbs of cq support
From: Lijun Ou @ 2016-11-13 10:35 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1479033360-56035-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the relatived cq verbs for userspace
of hns, include:
1. create_cq
2. poll_cq
3. req_notify_cq
4. cq_event
5. destroy_cq
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v5:
- Eliminate the warning when CFLAGS equal to -m32
v4:
- Eliminate the warning by Travis CI testing
v3:
- No change over the v2
v2:
- Delete the unused code
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 57 +++++-
providers/hns/hns_roce_u.h | 94 ++++++++++
providers/hns/hns_roce_u_abi.h | 12 ++
providers/hns/hns_roce_u_buf.c | 61 +++++++
providers/hns/hns_roce_u_db.h | 54 ++++++
providers/hns/hns_roce_u_hw_v1.c | 368 +++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u_hw_v1.h | 163 +++++++++++++++++
providers/hns/hns_roce_u_verbs.c | 116 ++++++++++++
8 files changed, 920 insertions(+), 5 deletions(-)
create mode 100644 providers/hns/hns_roce_u_buf.c
create mode 100644 providers/hns/hns_roce_u_db.h
create mode 100644 providers/hns/hns_roce_u_hw_v1.c
create mode 100644 providers/hns/hns_roce_u_hw_v1.h
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 53e2720..1877218 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -46,15 +46,19 @@
static const struct {
char hid[HID_LEN];
+ void *data;
+ int version;
} acpi_table[] = {
- {"acpi:HISI00D1:"},
- {},
+ {"acpi:HISI00D1:", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
+ {},
};
static const struct {
char compatible[DEV_MATCH_LEN];
+ void *data;
+ int version;
} dt_table[] = {
- {"hisilicon,hns-roce-v1"},
+ {"hisilicon,hns-roce-v1", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
{},
};
@@ -93,6 +97,21 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
goto err_free;
}
+ if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+ /*
+ * when vma->vm_pgoff is 1, the cq_tptr_base includes 64K CQ,
+ * a pointer of CQ need 2B size
+ */
+ context->cq_tptr_base = mmap(NULL, HNS_ROCE_CQ_DB_BUF_SIZE,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ cmd_fd, HNS_ROCE_TPTR_OFFSET);
+ if (context->cq_tptr_base == MAP_FAILED) {
+ fprintf(stderr,
+ PFX "Warning: Failed to mmap cq_tptr page.\n");
+ goto db_free;
+ }
+ }
+
pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
context->ibv_ctx.ops.query_device = hns_roce_u_query_device;
@@ -102,6 +121,12 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.reg_mr = hns_roce_u_reg_mr;
context->ibv_ctx.ops.dereg_mr = hns_roce_u_dereg_mr;
+ context->ibv_ctx.ops.create_cq = hns_roce_u_create_cq;
+ context->ibv_ctx.ops.poll_cq = hr_dev->u_hw->poll_cq;
+ context->ibv_ctx.ops.req_notify_cq = hr_dev->u_hw->arm_cq;
+ context->ibv_ctx.ops.cq_event = hns_roce_u_cq_event;
+ context->ibv_ctx.ops.destroy_cq = hns_roce_u_destroy_cq;
+
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
@@ -112,6 +137,16 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
return &context->ibv_ctx;
tptr_free:
+ if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+ if (munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE))
+ fprintf(stderr, PFX "Warning: Munmap tptr failed.\n");
+ context->cq_tptr_base = NULL;
+ }
+
+db_free:
+ munmap(context->uar, to_hr_dev(ibdev)->page_size);
+ context->uar = NULL;
+
err_free:
free(context);
return NULL;
@@ -122,6 +157,8 @@ static void hns_roce_free_context(struct ibv_context *ibctx)
struct hns_roce_context *context = to_hr_ctx(ibctx);
munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+ if (to_hr_dev(ibctx->device)->hw_version == HNS_ROCE_HW_VER1)
+ munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE);
context->uar = NULL;
@@ -140,18 +177,26 @@ static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
struct hns_roce_device *dev;
char value[128];
int i;
+ void *u_hw;
+ int hw_version;
if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
value, sizeof(value)) > 0)
for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
- if (!strcmp(value, acpi_table[i].hid))
+ if (!strcmp(value, acpi_table[i].hid)) {
+ u_hw = acpi_table[i].data;
+ hw_version = acpi_table[i].version;
goto found;
+ }
if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
value, sizeof(value)) > 0)
for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
- if (!strcmp(value, dt_table[i].compatible))
+ if (!strcmp(value, dt_table[i].compatible)) {
+ u_hw = dt_table[i].data;
+ hw_version = dt_table[i].version;
goto found;
+ }
return NULL;
@@ -164,6 +209,8 @@ found:
}
dev->ibv_dev.ops = hns_roce_dev_ops;
+ dev->u_hw = (struct hns_roce_u_hw *)u_hw;
+ dev->hw_version = hw_version;
dev->page_size = sysconf(_SC_PAGESIZE);
return &dev->ibv_dev;
}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 5b73794..c3e364d 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -40,18 +40,53 @@
#include <infiniband/verbs.h>
#include <ccan/container_of.h>
+#define HNS_ROCE_CQE_ENTRY_SIZE 0x20
+
+#define HNS_ROCE_MAX_CQ_NUM 0x10000
+#define HNS_ROCE_MIN_CQE_NUM 0x40
+#define HNS_ROCE_CQ_DB_BUF_SIZE ((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
+#define HNS_ROCE_TPTR_OFFSET 0x1000
#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
#define PFX "hns: "
+#define roce_get_field(origin, mask, shift) \
+ (((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+ roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+ do { \
+ (origin) &= (~(mask)); \
+ (origin) |= (((unsigned int)(val) << (shift)) & (mask)); \
+ } while (0)
+
+#define roce_set_bit(origin, shift, val) \
+ roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
enum {
HNS_ROCE_QP_TABLE_BITS = 8,
HNS_ROCE_QP_TABLE_SIZE = 1 << HNS_ROCE_QP_TABLE_BITS,
};
+/* operation type list */
+enum {
+ /* rq&srq operation */
+ HNS_ROCE_OPCODE_SEND_DATA_RECEIVE = 0x06,
+ HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE = 0x07,
+};
+
struct hns_roce_device {
struct ibv_device ibv_dev;
int page_size;
+ struct hns_roce_u_hw *u_hw;
+ int hw_version;
+};
+
+struct hns_roce_buf {
+ void *buf;
+ unsigned int length;
};
struct hns_roce_context {
@@ -59,7 +94,10 @@ struct hns_roce_context {
void *uar;
pthread_spinlock_t uar_lock;
+ void *cq_tptr_base;
+
struct {
+ struct hns_roce_qp **table;
int refcnt;
} qp_table[HNS_ROCE_QP_TABLE_SIZE];
@@ -78,6 +116,44 @@ struct hns_roce_pd {
unsigned int pdn;
};
+struct hns_roce_cq {
+ struct ibv_cq ibv_cq;
+ struct hns_roce_buf buf;
+ pthread_spinlock_t lock;
+ unsigned int cqn;
+ unsigned int cq_depth;
+ unsigned int cons_index;
+ unsigned int *set_ci_db;
+ unsigned int *arm_db;
+ int arm_sn;
+};
+
+struct hns_roce_wq {
+ unsigned long *wrid;
+ unsigned int wqe_cnt;
+ unsigned int tail;
+ int wqe_shift;
+ int offset;
+};
+
+struct hns_roce_qp {
+ struct ibv_qp ibv_qp;
+ struct hns_roce_buf buf;
+ unsigned int sq_signal_bits;
+ struct hns_roce_wq sq;
+ struct hns_roce_wq rq;
+};
+
+struct hns_roce_u_hw {
+ int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
+ int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+};
+
+static inline unsigned long align(unsigned long val, unsigned long align)
+{
+ return (val + align - 1) & ~(align - 1);
+}
+
static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
{
return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -93,6 +169,11 @@ static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
}
+static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
+{
+ return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr);
int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -105,4 +186,17 @@ struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
int access);
int hns_roce_u_dereg_mr(struct ibv_mr *mr);
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector);
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq);
+void hns_roce_u_cq_event(struct ibv_cq *cq);
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+ int page_size);
+void hns_roce_free_buf(struct hns_roce_buf *buf);
+
+extern struct hns_roce_u_hw hns_roce_u_hw_v1;
+
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 0a0cd0c..1e62a7e 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -46,4 +46,16 @@ struct hns_roce_alloc_pd_resp {
__u32 reserved;
};
+struct hns_roce_create_cq {
+ struct ibv_create_cq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+};
+
+struct hns_roce_create_cq_resp {
+ struct ibv_create_cq_resp ibv_resp;
+ __u32 cqn;
+ __u32 reserved;
+};
+
#endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_buf.c b/providers/hns/hns_roce_u_buf.c
new file mode 100644
index 0000000..f92ea65
--- /dev/null
+++ b/providers/hns/hns_roce_u_buf.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <errno.h>
+#include <sys/mman.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+ int page_size)
+{
+ int ret;
+
+ buf->length = align(size, page_size);
+ buf->buf = mmap(NULL, buf->length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (buf->buf == MAP_FAILED)
+ return errno;
+
+ ret = ibv_dontfork_range(buf->buf, size);
+ if (ret)
+ munmap(buf->buf, buf->length);
+
+ return ret;
+}
+
+void hns_roce_free_buf(struct hns_roce_buf *buf)
+{
+ ibv_dofork_range(buf->buf, buf->length);
+
+ munmap(buf->buf, buf->length);
+}
diff --git a/providers/hns/hns_roce_u_db.h b/providers/hns/hns_roce_u_db.h
new file mode 100644
index 0000000..76d13ce
--- /dev/null
+++ b/providers/hns/hns_roce_u_db.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/types.h>
+
+#include "hns_roce_u.h"
+
+#ifndef _HNS_ROCE_U_DB_H
+#define _HNS_ROCE_U_DB_H
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0])
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1])
+#else
+#error __BYTE_ORDER not defined
+#endif
+
+static inline void hns_roce_write64(uint32_t val[2],
+ struct hns_roce_context *ctx, int offset)
+{
+ *(volatile uint64_t *) (ctx->uar + offset) = HNS_ROCE_PAIR_TO_64(val);
+}
+
+#endif /* _HNS_ROCE_U_DB_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
new file mode 100644
index 0000000..39a67b1
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -0,0 +1,368 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <malloc.h>
+#include "hns_roce_u_db.h"
+#include "hns_roce_u_hw_v1.h"
+#include "hns_roce_u.h"
+
+static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
+ struct hns_roce_cq *cq)
+{
+ struct hns_roce_cq_db cq_db;
+
+ cq_db.u32_4 = 0;
+ cq_db.u32_8 = 0;
+
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+ CQ_DB_U32_8_CMD_MDF_S, 0);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+ cq->cqn);
+ roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+ CQ_DB_U32_4_CONS_IDX_S,
+ cq->cons_index & ((cq->cq_depth << 1) - 1));
+
+ hns_roce_write64((uint32_t *)&cq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_handle_error_cqe(struct hns_roce_cqe *cqe,
+ struct ibv_wc *wc)
+{
+ switch (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) &
+ HNS_ROCE_CQE_STATUS_MASK) {
+ fprintf(stderr, PFX "error cqe!\n");
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR:
+ wc->status = IBV_WC_LOC_LEN_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR:
+ wc->status = IBV_WC_LOC_QP_OP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR:
+ wc->status = IBV_WC_LOC_PROT_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR:
+ wc->status = IBV_WC_WR_FLUSH_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR:
+ wc->status = IBV_WC_MW_BIND_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR:
+ wc->status = IBV_WC_BAD_RESP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+ wc->status = IBV_WC_LOC_ACCESS_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
+ wc->status = IBV_WC_REM_INV_REQ_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+ wc->status = IBV_WC_REM_ACCESS_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR:
+ wc->status = IBV_WC_REM_OP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RETRY_EXC_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+}
+
+static struct hns_roce_cqe *get_cqe(struct hns_roce_cq *cq, int entry)
+{
+ return cq->buf.buf + entry * HNS_ROCE_CQE_ENTRY_SIZE;
+}
+
+static void *get_sw_cqe(struct hns_roce_cq *cq, int n)
+{
+ struct hns_roce_cqe *cqe = get_cqe(cq, n & cq->ibv_cq.cqe);
+
+ return (!!(roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_OWNER_S)) ^
+ !!(n & (cq->ibv_cq.cqe + 1))) ? cqe : NULL;
+}
+
+static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
+{
+ return get_sw_cqe(cq, cq->cons_index);
+}
+
+static void *get_send_wqe(struct hns_roce_qp *qp, int n)
+{
+ if ((n < 0) || (n > qp->sq.wqe_cnt)) {
+ printf("sq wqe index:%d,sq wqe cnt:%d\r\n", n, qp->sq.wqe_cnt);
+ return NULL;
+ }
+
+ return (void *)(qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift));
+}
+
+static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
+ uint32_t qpn)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (ctx->qp_table[tind].refcnt) {
+ return ctx->qp_table[tind].table[qpn & ctx->qp_table_mask];
+ } else {
+ printf("hns_roce_find_qp fail!\n");
+ return NULL;
+ }
+}
+
+static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
+ struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
+{
+ uint32_t qpn;
+ int is_send;
+ uint16_t wqe_ctr;
+ uint32_t local_qpn;
+ struct hns_roce_wq *wq = NULL;
+ struct hns_roce_cqe *cqe = NULL;
+ struct hns_roce_wqe_ctrl_seg *sq_wqe = NULL;
+
+ /* According to CI, find the relative cqe */
+ cqe = next_cqe_sw(cq);
+ if (!cqe)
+ return CQ_EMPTY;
+
+ /* Get the next cqe, CI will be added gradually */
+ ++cq->cons_index;
+
+ rmb();
+
+ qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S);
+
+ is_send = (roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_SQ_RQ_FLAG_S) ==
+ HNS_ROCE_CQE_IS_SQ);
+
+ local_qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S);
+
+ /* if qp is zero, it will not get the correct qpn */
+ if (!*cur_qp ||
+ (local_qpn & HNS_ROCE_CQE_QPN_MASK) != (*cur_qp)->ibv_qp.qp_num) {
+
+ *cur_qp = hns_roce_find_qp(to_hr_ctx(cq->ibv_cq.context),
+ qpn & 0xffffff);
+ if (!*cur_qp) {
+ fprintf(stderr, PFX "can't find qp!\n");
+ return CQ_POLL_ERR;
+ }
+ }
+ wc->qp_num = qpn & 0xffffff;
+
+ if (is_send) {
+ wq = &(*cur_qp)->sq;
+ /*
+ * if sq_signal_bits is 1, the tail pointer first update to
+ * the wqe corresponding the current cqe
+ */
+ if ((*cur_qp)->sq_signal_bits) {
+ wqe_ctr = (uint16_t)(roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_WQE_INDEX_M,
+ CQE_BYTE_4_WQE_INDEX_S));
+ /*
+ * wq->tail will plus a positive number every time,
+ * when wq->tail exceeds 32b, it is 0 and acc
+ */
+ wq->tail += (wqe_ctr - (uint16_t) wq->tail) &
+ (wq->wqe_cnt - 1);
+ }
+ /* write the wr_id of wq into the wc */
+ wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+ ++wq->tail;
+ } else {
+ wq = &(*cur_qp)->rq;
+ wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+ ++wq->tail;
+ }
+
+ /*
+ * HW maintains wc status, set the err type and directly return, after
+ * generated the incorrect CQE
+ */
+ if (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) != HNS_ROCE_CQE_SUCCESS) {
+ hns_roce_handle_error_cqe(cqe, wc);
+ return CQ_OK;
+ }
+ wc->status = IBV_WC_SUCCESS;
+
+ /*
+ * According to the opcode type of cqe, mark the opcode and other
+ * information of wc
+ */
+ if (is_send) {
+ /* Get opcode and flag before update the tail point for send */
+ sq_wqe = (struct hns_roce_wqe_ctrl_seg *)
+ get_send_wqe(*cur_qp, roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_WQE_INDEX_M,
+ CQE_BYTE_4_WQE_INDEX_S));
+ switch (sq_wqe->flag & HNS_ROCE_WQE_OPCODE_MASK) {
+ case HNS_ROCE_WQE_OPCODE_SEND:
+ wc->opcode = IBV_WC_SEND;
+ break;
+ case HNS_ROCE_WQE_OPCODE_RDMA_READ:
+ wc->opcode = IBV_WC_RDMA_READ;
+ wc->byte_len = cqe->byte_cnt;
+ break;
+ case HNS_ROCE_WQE_OPCODE_RDMA_WRITE:
+ wc->opcode = IBV_WC_RDMA_WRITE;
+ break;
+ case HNS_ROCE_WQE_OPCODE_BIND_MW2:
+ wc->opcode = IBV_WC_BIND_MW;
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+ wc->wc_flags = (sq_wqe->flag & HNS_ROCE_WQE_IMM ?
+ IBV_WC_WITH_IMM : 0);
+ } else {
+ /* Get opcode and flag in rq&srq */
+ wc->byte_len = (cqe->byte_cnt);
+
+ switch (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_OPERATION_TYPE_M,
+ CQE_BYTE_4_OPERATION_TYPE_S) &
+ HNS_ROCE_CQE_OPCODE_MASK) {
+ case HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE:
+ wc->opcode = IBV_WC_RECV_RDMA_WITH_IMM;
+ wc->wc_flags = IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->immediate_data;
+ break;
+ case HNS_ROCE_OPCODE_SEND_DATA_RECEIVE:
+ if (roce_get_bit(cqe->cqe_byte_4,
+ CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S)) {
+ wc->opcode = IBV_WC_RECV;
+ wc->wc_flags = IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->immediate_data;
+ } else {
+ wc->opcode = IBV_WC_RECV;
+ wc->wc_flags = 0;
+ }
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+ }
+
+ return CQ_OK;
+}
+
+static int hns_roce_u_v1_poll_cq(struct ibv_cq *ibvcq, int ne,
+ struct ibv_wc *wc)
+{
+ int npolled;
+ int err = CQ_OK;
+ struct hns_roce_qp *qp = NULL;
+ struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvcq->context);
+ struct hns_roce_device *dev = to_hr_dev(ibvcq->context->device);
+
+ pthread_spin_lock(&cq->lock);
+
+ for (npolled = 0; npolled < ne; ++npolled) {
+ err = hns_roce_v1_poll_one(cq, &qp, wc + npolled);
+ if (err != CQ_OK)
+ break;
+ }
+
+ if (npolled) {
+ if (dev->hw_version == HNS_ROCE_HW_VER1) {
+ *cq->set_ci_db = (unsigned short)(cq->cons_index &
+ ((cq->cq_depth << 1) - 1));
+ mb();
+ }
+
+ hns_roce_update_cq_cons_index(ctx, cq);
+ }
+
+ pthread_spin_unlock(&cq->lock);
+
+ return err == CQ_POLL_ERR ? err : npolled;
+}
+
+/**
+ * hns_roce_u_v1_arm_cq - request completion notification on a CQ
+ * @ibvcq: The completion queue to request notification for.
+ * @solicited: If non-zero, a event will be generated only for
+ * the next solicited CQ entry. If zero, any CQ entry,
+ * solicited or not, will generate an event
+ */
+static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
+{
+ uint32_t ci;
+ uint32_t solicited_flag;
+ struct hns_roce_cq_db cq_db;
+ struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+
+ ci = cq->cons_index & ((cq->cq_depth << 1) - 1);
+ solicited_flag = solicited ? HNS_ROCE_CQ_DB_REQ_SOL :
+ HNS_ROCE_CQ_DB_REQ_NEXT;
+
+ cq_db.u32_4 = 0;
+ cq_db.u32_8 = 0;
+
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+ CQ_DB_U32_8_CMD_MDF_S, 1);
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_NOTIFY_TYPE_S, solicited_flag);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+ cq->cqn);
+ roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+ CQ_DB_U32_4_CONS_IDX_S, ci);
+
+ hns_roce_write64((uint32_t *)&cq_db, to_hr_ctx(ibvcq->context),
+ ROCEE_DB_OTHERS_L_0_REG);
+ return 0;
+}
+
+struct hns_roce_u_hw hns_roce_u_hw_v1 = {
+ .poll_cq = hns_roce_u_v1_poll_cq,
+ .arm_cq = hns_roce_u_v1_arm_cq,
+};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
new file mode 100644
index 0000000..b249f54
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_HW_V1_H
+#define _HNS_ROCE_U_HW_V1_H
+
+#define HNS_ROCE_CQ_DB_REQ_SOL 1
+#define HNS_ROCE_CQ_DB_REQ_NEXT 0
+
+#define HNS_ROCE_CQE_IS_SQ 0
+
+#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN 32
+
+enum {
+ HNS_ROCE_WQE_IMM = 1 << 23,
+ HNS_ROCE_WQE_OPCODE_SEND = 0 << 16,
+ HNS_ROCE_WQE_OPCODE_RDMA_READ = 1 << 16,
+ HNS_ROCE_WQE_OPCODE_RDMA_WRITE = 2 << 16,
+ HNS_ROCE_WQE_OPCODE_BIND_MW2 = 6 << 16,
+ HNS_ROCE_WQE_OPCODE_MASK = 15 << 16,
+};
+
+struct hns_roce_wqe_ctrl_seg {
+ __be32 sgl_pa_h;
+ __be32 flag;
+};
+
+enum {
+ CQ_OK = 0,
+ CQ_EMPTY = -1,
+ CQ_POLL_ERR = -2,
+};
+
+enum {
+ HNS_ROCE_CQE_QPN_MASK = 0x3ffff,
+ HNS_ROCE_CQE_STATUS_MASK = 0x1f,
+ HNS_ROCE_CQE_OPCODE_MASK = 0xf,
+};
+
+enum {
+ HNS_ROCE_CQE_SUCCESS,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR,
+ HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR,
+ HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR,
+ HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR,
+ HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR,
+ HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR,
+};
+
+struct hns_roce_cq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+#define CQ_DB_U32_4_CONS_IDX_S 0
+#define CQ_DB_U32_4_CONS_IDX_M (((1UL << 16) - 1) << CQ_DB_U32_4_CONS_IDX_S)
+
+#define CQ_DB_U32_8_CQN_S 0
+#define CQ_DB_U32_8_CQN_M (((1UL << 16) - 1) << CQ_DB_U32_8_CQN_S)
+
+#define CQ_DB_U32_8_NOTIFY_TYPE_S 16
+
+#define CQ_DB_U32_8_CMD_MDF_S 24
+#define CQ_DB_U32_8_CMD_MDF_M (((1UL << 4) - 1) << CQ_DB_U32_8_CMD_MDF_S)
+
+#define CQ_DB_U32_8_CMD_S 28
+#define CQ_DB_U32_8_CMD_M (((1UL << 3) - 1) << CQ_DB_U32_8_CMD_S)
+
+#define CQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_cqe {
+ unsigned int cqe_byte_4;
+ union {
+ unsigned int r_key;
+ unsigned int immediate_data;
+ };
+ unsigned int byte_cnt;
+ unsigned int cqe_byte_16;
+ unsigned int cqe_byte_20;
+ unsigned int s_mac_l;
+ unsigned int cqe_byte_28;
+ unsigned int reserved;
+};
+#define CQE_BYTE_4_OPERATION_TYPE_S 0
+#define CQE_BYTE_4_OPERATION_TYPE_M \
+ (((1UL << 4) - 1) << CQE_BYTE_4_OPERATION_TYPE_S)
+
+#define CQE_BYTE_4_OWNER_S 7
+
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_S 8
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_M \
+ (((1UL << 5) - 1) << CQE_BYTE_4_STATUS_OF_THE_OPERATION_S)
+
+#define CQE_BYTE_4_SQ_RQ_FLAG_S 14
+
+#define CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S 15
+
+#define CQE_BYTE_4_WQE_INDEX_S 16
+#define CQE_BYTE_4_WQE_INDEX_M (((1UL << 14) - 1) << CQE_BYTE_4_WQE_INDEX_S)
+
+#define CQE_BYTE_16_LOCAL_QPN_S 0
+#define CQE_BYTE_16_LOCAL_QPN_M (((1UL << 24) - 1) << CQE_BYTE_16_LOCAL_QPN_S)
+
+#define ROCEE_DB_SQ_L_0_REG 0x230
+
+#define ROCEE_DB_OTHERS_L_0_REG 0x238
+
+struct hns_roce_rc_send_wqe {
+ unsigned int sgl_ba_31_0;
+ unsigned int u32_1;
+ union {
+ unsigned int r_key;
+ unsigned int immediate_data;
+ };
+ unsigned int msg_length;
+ unsigned int rvd_3;
+ unsigned int rvd_4;
+ unsigned int rvd_5;
+ unsigned int rvd_6;
+ uint64_t va0;
+ unsigned int l_key0;
+ unsigned int length0;
+
+ uint64_t va1;
+ unsigned int l_key1;
+ unsigned int length1;
+};
+
+#endif /* _HNS_ROCE_U_HW_V1_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index c163d3c..c9324dd 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -40,6 +40,8 @@
#include <unistd.h>
#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+#include "hns_roce_u_hw_v1.h"
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr)
@@ -150,3 +152,117 @@ int hns_roce_u_dereg_mr(struct ibv_mr *mr)
return ret;
}
+
+static int align_cq_size(int req)
+{
+ int nent;
+
+ for (nent = HNS_ROCE_MIN_CQE_NUM; nent < req; nent <<= 1)
+ ;
+
+ return nent;
+}
+
+static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
+{
+ if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
+ fprintf(stderr, "cqe = %d, less than minimum CQE number.\n",
+ *cqe);
+ *cqe = HNS_ROCE_MIN_CQE_NUM;
+ }
+
+ if (*cqe > context->max_cqe)
+ return -1;
+
+ return 0;
+}
+
+static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
+ struct hns_roce_buf *buf, int nent)
+{
+ if (hns_roce_alloc_buf(buf,
+ align(nent * HNS_ROCE_CQE_ENTRY_SIZE, dev->page_size),
+ dev->page_size))
+ return -1;
+ memset(buf->buf, 0, nent * HNS_ROCE_CQE_ENTRY_SIZE);
+
+ return 0;
+}
+
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector)
+{
+ struct hns_roce_create_cq cmd;
+ struct hns_roce_create_cq_resp resp;
+ struct hns_roce_cq *cq;
+ int ret;
+
+ if (hns_roce_verify_cq(&cqe, to_hr_ctx(context)))
+ return NULL;
+
+ cq = malloc(sizeof(*cq));
+ if (!cq)
+ return NULL;
+
+ cq->cons_index = 0;
+
+ if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
+ goto err;
+
+ cqe = align_cq_size(cqe);
+
+ if (hns_roce_alloc_cq_buf(to_hr_dev(context->device), &cq->buf, cqe))
+ goto err;
+
+ cmd.buf_addr = (uintptr_t) cq->buf.buf;
+
+ ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
+ &cq->ibv_cq, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (ret)
+ goto err_db;
+
+ cq->cqn = resp.cqn;
+ cq->cq_depth = cqe;
+
+ if (to_hr_dev(context->device)->hw_version == HNS_ROCE_HW_VER1)
+ cq->set_ci_db = to_hr_ctx(context)->cq_tptr_base + cq->cqn * 2;
+ else
+ cq->set_ci_db = to_hr_ctx(context)->uar +
+ ROCEE_DB_OTHERS_L_0_REG;
+
+ cq->arm_db = cq->set_ci_db;
+ cq->arm_sn = 1;
+ *(cq->set_ci_db) = 0;
+ *(cq->arm_db) = 0;
+
+ return &cq->ibv_cq;
+
+err_db:
+ hns_roce_free_buf(&cq->buf);
+
+err:
+ free(cq);
+
+ return NULL;
+}
+
+void hns_roce_u_cq_event(struct ibv_cq *cq)
+{
+ to_hr_cq(cq)->arm_sn++;
+}
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq)
+{
+ int ret;
+
+ ret = ibv_cmd_destroy_cq(cq);
+ if (ret)
+ return ret;
+
+ hns_roce_free_buf(&to_hr_cq(cq)->buf);
+ free(to_hr_cq(cq));
+
+ return ret;
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox