Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* [PATCH rdma-next 2/2] IB/mlx5: Add support for CQE compressing
From: Leon Romanovsky @ 2016-10-31 10:16 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909005-12867-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

CQE compressing reduces PCI overhead by coalescing and compressing
multiple CQEs into a single merged CQE. Successful compressing
improves message rate especially for small packet traffic.

CQE compressing is supported for all 64B CQE formats (with certain
limitations) generated by RQ/Responder or by SQ/Requestor.

Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/cq.c | 30 +++++++++++++++++++++++++++++-
 include/uapi/rdma/mlx5-abi.h    |  4 +++-
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 79d017b..9ac36ef 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -731,7 +731,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 			  int entries, u32 **cqb,
 			  int *cqe_size, int *index, int *inlen)
 {
-	struct mlx5_ib_create_cq ucmd;
+	struct mlx5_ib_create_cq ucmd = {};
 	size_t ucmdlen;
 	int page_shift;
 	__be64 *pas;
@@ -792,8 +792,36 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 
 	*index = to_mucontext(context)->uuari.uars[0].index;
 
+	if (ucmd.cqe_comp_en == 1) {
+		if (unlikely((*cqe_size != 64) ||
+			     !MLX5_CAP_GEN(dev->mdev, cqe_compression))) {
+			err = -EOPNOTSUPP;
+			mlx5_ib_warn(dev, "CQE compression is not supported for size %d!\n",
+				     *cqe_size);
+			goto err_cqb;
+		}
+
+		if (unlikely(!ucmd.cqe_comp_res_format ||
+			     !(ucmd.cqe_comp_res_format <
+			       MLX5_IB_CQE_RES_RESERVED) ||
+			     (ucmd.cqe_comp_res_format &
+			      (ucmd.cqe_comp_res_format - 1)))) {
+			err = -EOPNOTSUPP;
+			mlx5_ib_warn(dev, "CQE compression res format %d is not supported!\n",
+				     ucmd.cqe_comp_res_format);
+			goto err_cqb;
+		}
+
+		MLX5_SET(cqc, cqc, cqe_comp_en, 1);
+		MLX5_SET(cqc, cqc, mini_cqe_res_format,
+			 ilog2(ucmd.cqe_comp_res_format));
+	}
+
 	return 0;
 
+err_cqb:
+	kfree(cqb);
+
 err_db:
 	mlx5_ib_db_unmap_user(to_mucontext(context), &cq->db);
 
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 24a7435..208710b 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -147,7 +147,9 @@ struct mlx5_ib_create_cq {
 	__u64	buf_addr;
 	__u64	db_addr;
 	__u32	cqe_size;
-	__u32	reserved; /* explicit padding (optional on i386) */
+	__u8    cqe_comp_en;
+	__u8    cqe_comp_res_format;
+	__u16	reserved; /* explicit padding (optional on i386) */
 };
 
 struct mlx5_ib_create_cq_resp {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-next 0/4] Add packet pacing support for IB verbs
From: Leon Romanovsky @ 2016-10-31 10:21 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

When sending from a 10G host to a 1G host, it is easy to overrun the receiver,
leading to packet loss and traffic backing off. Similar problems occur when
a 10G host sends data to a sub-10G virtual circuit, or a 40G host sending
to a 10G host. Packet pacing could control packet injection rate and reduces
network congestion to maximize throughput & minimize network latency.

Packet pacing is a rate limiting and shaping for a QP (SQ for RAW QP), set
and change the rate is done by modifying QP. This series of patch made the
following high level changes:
 1. Report rate limit capabilities through user data. Reported capabilities
    include: The maximum and minimum rate limit in kbps supported by packet
    pacing; Bitmap showing which QP types are supported by packet pacing
    operation.
 2. Extend modify QP interface for growing attributes. Add rate limit support
    to the extended interface.
 3. Enable mlx5-based hardware to be able to update the rate limit for
    RAW QP packet.

Available in the "topic/packet_pacing" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git

Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/packet_pacing

Thanks,
  Bodong & Leon

Bodong Wang (4):
  IB/mlx5: Report mlx5 packet pacing capabilities when querying device
  IB/core: Support rate limit for packet pacing
  IB/uverbs: Extend modify_qp and support packet pacing
  IB/mlx5: Update the rate limit according to user setting for RAW QP

 drivers/infiniband/core/uverbs.h      |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 178 +++++++++++++++++++++-------------
 drivers/infiniband/core/uverbs_main.c |   1 +
 drivers/infiniband/core/verbs.c       |   2 +
 drivers/infiniband/hw/mlx5/main.c     |  16 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   1 +
 drivers/infiniband/hw/mlx5/qp.c       |  71 ++++++++++++--
 include/rdma/ib_verbs.h               |   2 +
 include/uapi/rdma/ib_user_verbs.h     |  12 +++
 include/uapi/rdma/mlx5-abi.h          |  13 +++
 10 files changed, 219 insertions(+), 78 deletions(-)

--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH rdma-next 1/4] IB/mlx5: Report mlx5 packet pacing capabilities when querying device
From: Leon Romanovsky @ 2016-10-31 10:21 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Enable mlx5 based hardware to report packet pacing capabilities
from kernel to user space. Packet pacing allows to limit the rate to any
number between the maximum and minimum, based on user settings.

The capabilities are exposed to user space through query_device by uhw.
The following capabilities are reported:

1. The maximum and minimum rate limit in kbps supported by packet pacing.
2. Bitmap showing which QP types are supported by packet pacing operation.

Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 13 +++++++++++++
 include/uapi/rdma/mlx5-abi.h      | 13 +++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2217477..ed9d327 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -669,6 +669,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 			1 << MLX5_CAP_GEN(dev->mdev, log_max_rq);
 	}
 
+	if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen)) {
+		if (MLX5_CAP_QOS(mdev, packet_pacing) &&
+		    MLX5_CAP_GEN(mdev, qos)) {
+			resp.packet_pacing_caps.qp_rate_limit_max =
+				MLX5_CAP_QOS(mdev, packet_pacing_max_rate);
+			resp.packet_pacing_caps.qp_rate_limit_min =
+				MLX5_CAP_QOS(mdev, packet_pacing_min_rate);
+			resp.packet_pacing_caps.supported_qpts |=
+				1 << IB_QPT_RAW_PACKET;
+		}
+		resp.response_length += sizeof(resp.packet_pacing_caps);
+	}
+
 	if (uhw->outlen) {
 		err = ib_copy_to_udata(uhw, &resp, resp.response_length);
 
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index f5d0f4e..4e9338d 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -124,11 +124,24 @@ struct mlx5_ib_rss_caps {
 	__u8 reserved[7];
 };
 
+struct mlx5_packet_pacing_caps {
+	__u32 qp_rate_limit_min;
+	__u32 qp_rate_limit_max; /* In kpbs */
+
+	/* Corresponding bit will be set if qp type from
+	 * 'enum ib_qp_type' is supported, e.g.
+	 * supported_qpts |= 1 << IB_QPT_RAW_PACKET
+	 */
+	__u32 supported_qpts;
+	__u32 reserved;
+};
+
 struct mlx5_ib_query_device_resp {
 	__u32	comp_mask;
 	__u32	response_length;
 	struct	mlx5_ib_tso_caps tso_caps;
 	struct	mlx5_ib_rss_caps rss_caps;
+	struct	mlx5_packet_pacing_caps packet_pacing_caps;
 };
 
 struct mlx5_ib_create_cq {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-next 2/4] IB/core: Support rate limit for packet pacing
From: Leon Romanovsky @ 2016-10-31 10:21 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add new member rate_limit to ib_qp_attr, it shows the packet pacing rate
in Kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask, and it could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 2 ++
 include/rdma/ib_verbs.h         | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8368764..3e688b3 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1014,6 +1014,7 @@ static const struct {
 						 IB_QP_QKEY),
 				 [IB_QPT_GSI] = (IB_QP_CUR_STATE		|
 						 IB_QP_QKEY),
+				 [IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
 			 }
 		}
 	},
@@ -1047,6 +1048,7 @@ static const struct {
 						IB_QP_QKEY),
 				[IB_QPT_GSI] = (IB_QP_CUR_STATE			|
 						IB_QP_QKEY),
+				[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
 			}
 		},
 		[IB_QPS_SQD]   = {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..a065361 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1102,6 +1102,7 @@ enum ib_qp_attr_mask {
 	IB_QP_RESERVED2			= (1<<22),
 	IB_QP_RESERVED3			= (1<<23),
 	IB_QP_RESERVED4			= (1<<24),
+	IB_QP_RATE_LIMIT		= (1<<25),
 };
 
 enum ib_qp_state {
@@ -1151,6 +1152,7 @@ struct ib_qp_attr {
 	u8			rnr_retry;
 	u8			alt_port_num;
 	u8			alt_timeout;
+	u32			rate_limit;
 };
 
 enum ib_wr_opcode {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-next 3/4] IB/uverbs: Extend modify_qp and support packet pacing
From: Leon Romanovsky @ 2016-10-31 10:21 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

- Modify_qp is extended to support more attributes. Existing
  applications are not affected when calling modify_qp. New
  applications could call modify_qp_ex to use the extended fields.
- New member rate_limit is added to the modify_qp extended structure,
  users can modify it through modify_qp to control packet packing.

Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 178 +++++++++++++++++++++-------------
 drivers/infiniband/core/uverbs_main.c |   1 +
 include/uapi/rdma/ib_user_verbs.h     |  12 +++
 4 files changed, 123 insertions(+), 69 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index df26a74..455034a 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -289,5 +289,6 @@ IB_UVERBS_DECLARE_EX_CMD(modify_wq);
 IB_UVERBS_DECLARE_EX_CMD(destroy_wq);
 IB_UVERBS_DECLARE_EX_CMD(create_rwq_ind_table);
 IB_UVERBS_DECLARE_EX_CMD(destroy_rwq_ind_table);
+IB_UVERBS_DECLARE_EX_CMD(modify_qp);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..14ceba3 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2328,94 +2328,86 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
 	}
 }
 
-ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
-			    struct ib_device *ib_dev,
-			    const char __user *buf, int in_len,
-			    int out_len)
+static int modify_qp(struct ib_uverbs_file *file,
+		     struct ib_uverbs_ex_modify_qp *cmd, struct ib_udata *udata)
 {
-	struct ib_uverbs_modify_qp cmd;
-	struct ib_udata            udata;
-	struct ib_qp              *qp;
-	struct ib_qp_attr         *attr;
-	int                        ret;
-
-	if (copy_from_user(&cmd, buf, sizeof cmd))
-		return -EFAULT;
-
-	INIT_UDATA(&udata, buf + sizeof cmd, NULL, in_len - sizeof cmd,
-		   out_len);
+	struct ib_qp_attr *attr;
+	struct ib_qp *qp;
+	int ret;
 
 	attr = kmalloc(sizeof *attr, GFP_KERNEL);
 	if (!attr)
 		return -ENOMEM;
 
-	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
+	qp = idr_read_qp(cmd->base.qp_handle, file->ucontext);
 	if (!qp) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	attr->qp_state 		  = cmd.qp_state;
-	attr->cur_qp_state 	  = cmd.cur_qp_state;
-	attr->path_mtu 		  = cmd.path_mtu;
-	attr->path_mig_state 	  = cmd.path_mig_state;
-	attr->qkey 		  = cmd.qkey;
-	attr->rq_psn 		  = cmd.rq_psn;
-	attr->sq_psn 		  = cmd.sq_psn;
-	attr->dest_qp_num 	  = cmd.dest_qp_num;
-	attr->qp_access_flags 	  = cmd.qp_access_flags;
-	attr->pkey_index 	  = cmd.pkey_index;
-	attr->alt_pkey_index 	  = cmd.alt_pkey_index;
-	attr->en_sqd_async_notify = cmd.en_sqd_async_notify;
-	attr->max_rd_atomic 	  = cmd.max_rd_atomic;
-	attr->max_dest_rd_atomic  = cmd.max_dest_rd_atomic;
-	attr->min_rnr_timer 	  = cmd.min_rnr_timer;
-	attr->port_num 		  = cmd.port_num;
-	attr->timeout 		  = cmd.timeout;
-	attr->retry_cnt 	  = cmd.retry_cnt;
-	attr->rnr_retry 	  = cmd.rnr_retry;
-	attr->alt_port_num 	  = cmd.alt_port_num;
-	attr->alt_timeout 	  = cmd.alt_timeout;
-
-	memcpy(attr->ah_attr.grh.dgid.raw, cmd.dest.dgid, 16);
-	attr->ah_attr.grh.flow_label        = cmd.dest.flow_label;
-	attr->ah_attr.grh.sgid_index        = cmd.dest.sgid_index;
-	attr->ah_attr.grh.hop_limit         = cmd.dest.hop_limit;
-	attr->ah_attr.grh.traffic_class     = cmd.dest.traffic_class;
-	attr->ah_attr.dlid 	    	    = cmd.dest.dlid;
-	attr->ah_attr.sl   	    	    = cmd.dest.sl;
-	attr->ah_attr.src_path_bits 	    = cmd.dest.src_path_bits;
-	attr->ah_attr.static_rate   	    = cmd.dest.static_rate;
-	attr->ah_attr.ah_flags 	    	    = cmd.dest.is_global ? IB_AH_GRH : 0;
-	attr->ah_attr.port_num 	    	    = cmd.dest.port_num;
-
-	memcpy(attr->alt_ah_attr.grh.dgid.raw, cmd.alt_dest.dgid, 16);
-	attr->alt_ah_attr.grh.flow_label    = cmd.alt_dest.flow_label;
-	attr->alt_ah_attr.grh.sgid_index    = cmd.alt_dest.sgid_index;
-	attr->alt_ah_attr.grh.hop_limit     = cmd.alt_dest.hop_limit;
-	attr->alt_ah_attr.grh.traffic_class = cmd.alt_dest.traffic_class;
-	attr->alt_ah_attr.dlid 	    	    = cmd.alt_dest.dlid;
-	attr->alt_ah_attr.sl   	    	    = cmd.alt_dest.sl;
-	attr->alt_ah_attr.src_path_bits     = cmd.alt_dest.src_path_bits;
-	attr->alt_ah_attr.static_rate       = cmd.alt_dest.static_rate;
-	attr->alt_ah_attr.ah_flags 	    = cmd.alt_dest.is_global ? IB_AH_GRH : 0;
-	attr->alt_ah_attr.port_num 	    = cmd.alt_dest.port_num;
+	attr->qp_state		  = cmd->base.qp_state;
+	attr->cur_qp_state	  = cmd->base.cur_qp_state;
+	attr->path_mtu		  = cmd->base.path_mtu;
+	attr->path_mig_state	  = cmd->base.path_mig_state;
+	attr->qkey		  = cmd->base.qkey;
+	attr->rq_psn		  = cmd->base.rq_psn;
+	attr->sq_psn		  = cmd->base.sq_psn;
+	attr->dest_qp_num	  = cmd->base.dest_qp_num;
+	attr->qp_access_flags	  = cmd->base.qp_access_flags;
+	attr->pkey_index	  = cmd->base.pkey_index;
+	attr->alt_pkey_index	  = cmd->base.alt_pkey_index;
+	attr->en_sqd_async_notify = cmd->base.en_sqd_async_notify;
+	attr->max_rd_atomic	  = cmd->base.max_rd_atomic;
+	attr->max_dest_rd_atomic  = cmd->base.max_dest_rd_atomic;
+	attr->min_rnr_timer	  = cmd->base.min_rnr_timer;
+	attr->port_num		  = cmd->base.port_num;
+	attr->timeout		  = cmd->base.timeout;
+	attr->retry_cnt		  = cmd->base.retry_cnt;
+	attr->rnr_retry		  = cmd->base.rnr_retry;
+	attr->alt_port_num	  = cmd->base.alt_port_num;
+	attr->alt_timeout	  = cmd->base.alt_timeout;
+	attr->rate_limit	  = cmd->rate_limit;
+
+	memcpy(attr->ah_attr.grh.dgid.raw, cmd->base.dest.dgid, 16);
+	attr->ah_attr.grh.flow_label	= cmd->base.dest.flow_label;
+	attr->ah_attr.grh.sgid_index	= cmd->base.dest.sgid_index;
+	attr->ah_attr.grh.hop_limit	= cmd->base.dest.hop_limit;
+	attr->ah_attr.grh.traffic_class	= cmd->base.dest.traffic_class;
+	attr->ah_attr.dlid		= cmd->base.dest.dlid;
+	attr->ah_attr.sl		= cmd->base.dest.sl;
+	attr->ah_attr.src_path_bits	= cmd->base.dest.src_path_bits;
+	attr->ah_attr.static_rate	= cmd->base.dest.static_rate;
+	attr->ah_attr.ah_flags		= cmd->base.dest.is_global ?
+					  IB_AH_GRH : 0;
+	attr->ah_attr.port_num		= cmd->base.dest.port_num;
+
+	memcpy(attr->alt_ah_attr.grh.dgid.raw, cmd->base.alt_dest.dgid, 16);
+	attr->alt_ah_attr.grh.flow_label    = cmd->base.alt_dest.flow_label;
+	attr->alt_ah_attr.grh.sgid_index    = cmd->base.alt_dest.sgid_index;
+	attr->alt_ah_attr.grh.hop_limit     = cmd->base.alt_dest.hop_limit;
+	attr->alt_ah_attr.grh.traffic_class = cmd->base.alt_dest.traffic_class;
+	attr->alt_ah_attr.dlid		    = cmd->base.alt_dest.dlid;
+	attr->alt_ah_attr.sl		    = cmd->base.alt_dest.sl;
+	attr->alt_ah_attr.src_path_bits	    = cmd->base.alt_dest.src_path_bits;
+	attr->alt_ah_attr.static_rate	    = cmd->base.alt_dest.static_rate;
+	attr->alt_ah_attr.ah_flags	    = cmd->base.alt_dest.is_global ?
+					      IB_AH_GRH : 0;
+	attr->alt_ah_attr.port_num	    = cmd->base.alt_dest.port_num;
 
 	if (qp->real_qp == qp) {
-		ret = ib_resolve_eth_dmac(qp, attr, &cmd.attr_mask);
+		ret = ib_resolve_eth_dmac(qp, attr, &cmd->base.attr_mask);
 		if (ret)
 			goto release_qp;
 		ret = qp->device->modify_qp(qp, attr,
-			modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
+					    modify_qp_mask(qp->qp_type,
+							   cmd->base.attr_mask),
+					    udata);
 	} else {
-		ret = ib_modify_qp(qp, attr, modify_qp_mask(qp->qp_type, cmd.attr_mask));
+		ret = ib_modify_qp(qp, attr,
+				   modify_qp_mask(qp->qp_type,
+						  cmd->base.attr_mask));
 	}
 
-	if (ret)
-		goto release_qp;
-
-	ret = in_len;
-
 release_qp:
 	put_qp_read(qp);
 
@@ -2425,6 +2417,54 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 	return ret;
 }
 
+ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
+			    struct ib_device *ib_dev,
+			    const char __user *buf, int in_len,
+			    int out_len)
+{
+	struct ib_uverbs_ex_modify_qp cmd = {};
+	struct ib_udata udata;
+	int ret;
+
+	if (copy_from_user(&cmd.base, buf, sizeof(cmd.base)))
+		return -EFAULT;
+
+	INIT_UDATA(&udata, buf + sizeof(cmd.base), NULL,
+		   in_len - sizeof(cmd.base), out_len);
+
+	ret = modify_qp(file, &cmd, &udata);
+	if (ret)
+		return ret;
+
+	return in_len;
+}
+
+int ib_uverbs_ex_modify_qp(struct ib_uverbs_file *file,
+			   struct ib_device *ib_dev,
+			   struct ib_udata *ucore,
+			   struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_modify_qp cmd = {};
+	int ret;
+
+	if (ucore->inlen < sizeof(cmd.base))
+		return -EINVAL;
+
+	ret = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (ret)
+		return ret;
+
+	if (ucore->inlen > sizeof(cmd)) {
+		if (ib_is_udata_cleared(ucore, sizeof(cmd),
+					ucore->inlen - sizeof(cmd)))
+			return -EOPNOTSUPP;
+	}
+
+	ret = modify_qp(file, &cmd, ucore);
+
+	return ret;
+}
+
 ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 			     struct ib_device *ib_dev,
 			     const char __user *buf, int in_len,
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 0012fa5..80839c3 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -137,6 +137,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_DESTROY_WQ]       = ib_uverbs_ex_destroy_wq,
 	[IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL] = ib_uverbs_ex_create_rwq_ind_table,
 	[IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL] = ib_uverbs_ex_destroy_rwq_ind_table,
+	[IB_USER_VERBS_EX_CMD_MODIFY_QP]        = ib_uverbs_ex_modify_qp,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25225eb..351f260 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -93,6 +93,7 @@ enum {
 	IB_USER_VERBS_EX_CMD_QUERY_DEVICE = IB_USER_VERBS_CMD_QUERY_DEVICE,
 	IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ,
 	IB_USER_VERBS_EX_CMD_CREATE_QP = IB_USER_VERBS_CMD_CREATE_QP,
+	IB_USER_VERBS_EX_CMD_MODIFY_QP = IB_USER_VERBS_CMD_MODIFY_QP,
 	IB_USER_VERBS_EX_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
 	IB_USER_VERBS_EX_CMD_DESTROY_FLOW,
 	IB_USER_VERBS_EX_CMD_CREATE_WQ,
@@ -684,9 +685,20 @@ struct ib_uverbs_modify_qp {
 	__u64 driver_data[0];
 };
 
+struct ib_uverbs_ex_modify_qp {
+	struct ib_uverbs_modify_qp base;
+	__u32	rate_limit;
+	__u32	reserved;
+};
+
 struct ib_uverbs_modify_qp_resp {
 };
 
+struct ib_uverbs_ex_modify_qp_resp {
+	__u32  comp_mask;
+	__u32  response_length;
+};
+
 struct ib_uverbs_destroy_qp {
 	__u64 response;
 	__u32 qp_handle;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-next 4/4] IB/mlx5: Update the rate limit according to user setting for RAW QP
From: Leon Romanovsky @ 2016-10-31 10:21 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

- Add MODIFY_QP_EX CMD to extend modify_qp.
- Rate limit will be updated in the following state transactions: RTR2RTS,
  RTS2RTS. The limit will be removed when SQ is in RST and ERR state.

Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    |  3 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c      | 71 ++++++++++++++++++++++++++++++++----
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index ed9d327..fd13786 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3025,7 +3025,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.uverbs_ex_cmd_mask =
 		(1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE)	|
 		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ)	|
-		(1ull << IB_USER_VERBS_EX_CMD_CREATE_QP);
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_QP)	|
+		(1ull << IB_USER_VERBS_EX_CMD_MODIFY_QP);
 
 	dev->ib_dev.query_device	= mlx5_ib_query_device;
 	dev->ib_dev.query_port		= mlx5_ib_query_port;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dcdcd19..0e43254 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -387,6 +387,7 @@ struct mlx5_ib_qp {
 	struct list_head	qps_list;
 	struct list_head	cq_recv_list;
 	struct list_head	cq_send_list;
+	u32			rate_limit;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 41f4c2a..148a26f 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -78,6 +78,7 @@ struct mlx5_wqe_eth_pad {
 
 enum raw_qp_set_mask_map {
 	MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID		= 1UL << 0,
+	MLX5_RAW_QP_RATE_LIMIT			= 1UL << 1,
 };
 
 struct mlx5_modify_raw_qp_param {
@@ -85,6 +86,7 @@ struct mlx5_modify_raw_qp_param {
 
 	u32 set_mask; /* raw_qp_set_mask_map */
 	u8 rq_q_ctr_id;
+	u32 rate_limit;
 };
 
 static void get_cqs(enum ib_qp_type qp_type,
@@ -2443,8 +2445,14 @@ static int modify_raw_packet_qp_rq(struct mlx5_ib_dev *dev,
 }
 
 static int modify_raw_packet_qp_sq(struct mlx5_core_dev *dev,
-				   struct mlx5_ib_sq *sq, int new_state)
+				   struct mlx5_ib_sq *sq,
+				   int new_state,
+				   const struct mlx5_modify_raw_qp_param *raw_qp_param)
 {
+	struct mlx5_ib_qp *ibqp = sq->base.container_mibqp;
+	u32 old_rate = ibqp->rate_limit;
+	u32 new_rate = old_rate;
+	u16 rl_index = 0;
 	void *in;
 	void *sqc;
 	int inlen;
@@ -2460,10 +2468,42 @@ static int modify_raw_packet_qp_sq(struct mlx5_core_dev *dev,
 	sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx);
 	MLX5_SET(sqc, sqc, state, new_state);
 
+	if (raw_qp_param->set_mask & MLX5_RAW_QP_RATE_LIMIT)
+		new_rate = raw_qp_param->rate_limit;
+
+	if (old_rate != new_rate) {
+		if (new_rate) {
+			err = mlx5_rl_add_rate(dev, new_rate, &rl_index);
+			if (err) {
+				pr_err("Failed configuring rate %u: %d\n",
+				       new_rate, err);
+				goto out;
+			}
+		}
+
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);
+		MLX5_SET(sqc, sqc, packet_pacing_rate_limit_index, rl_index);
+	}
+
 	err = mlx5_core_modify_sq(dev, sq->base.mqp.qpn, in, inlen);
-	if (err)
+	if (err) {
+		/* Remove new rate from table if failed */
+		if (new_rate &&
+		    old_rate != new_rate)
+			mlx5_rl_remove_rate(dev, new_rate);
 		goto out;
+	}
+
+	if ((new_state == MLX5_SQC_STATE_ERR) ||
+	    (new_state == MLX5_SQC_STATE_RST))
+		new_rate = 0;
 
+	/* Only remove the old rate after new rate was set */
+	if (old_rate &&
+	    old_rate != new_rate)
+		mlx5_rl_remove_rate(dev, old_rate);
+
+	ibqp->rate_limit = new_rate;
 	sq->state = new_state;
 
 out:
@@ -2478,6 +2518,8 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 	struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
 	struct mlx5_ib_rq *rq = &raw_packet_qp->rq;
 	struct mlx5_ib_sq *sq = &raw_packet_qp->sq;
+	int modify_rq = !!qp->rq.wqe_cnt;
+	int modify_sq = !!qp->sq.wqe_cnt;
 	int rq_state;
 	int sq_state;
 	int err;
@@ -2495,10 +2537,18 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 		rq_state = MLX5_RQC_STATE_RST;
 		sq_state = MLX5_SQC_STATE_RST;
 		break;
-	case MLX5_CMD_OP_INIT2INIT_QP:
-	case MLX5_CMD_OP_INIT2RTR_QP:
 	case MLX5_CMD_OP_RTR2RTS_QP:
 	case MLX5_CMD_OP_RTS2RTS_QP:
+		if (raw_qp_param->set_mask ==
+		    MLX5_RAW_QP_RATE_LIMIT) {
+			modify_rq = 0;
+			sq_state = sq->state;
+		} else {
+			return raw_qp_param->set_mask ? -EINVAL : 0;
+		}
+		break;
+	case MLX5_CMD_OP_INIT2INIT_QP:
+	case MLX5_CMD_OP_INIT2RTR_QP:
 		if (raw_qp_param->set_mask)
 			return -EINVAL;
 		else
@@ -2508,13 +2558,13 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 		return -EINVAL;
 	}
 
-	if (qp->rq.wqe_cnt) {
-		err = modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param);
+	if (modify_rq) {
+		err =  modify_raw_packet_qp_rq(dev, rq, rq_state, raw_qp_param);
 		if (err)
 			return err;
 	}
 
-	if (qp->sq.wqe_cnt) {
+	if (modify_sq) {
 		if (tx_affinity) {
 			err = modify_raw_packet_tx_affinity(dev->mdev, sq,
 							    tx_affinity);
@@ -2522,7 +2572,7 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 				return err;
 		}
 
-		return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state);
+		return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state, raw_qp_param);
 	}
 
 	return 0;
@@ -2777,6 +2827,11 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 			raw_qp_param.rq_q_ctr_id = mibport->q_cnt_id;
 			raw_qp_param.set_mask |= MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID;
 		}
+
+		if (attr_mask & IB_QP_RATE_LIMIT) {
+			raw_qp_param.rate_limit = attr->rate_limit;
+			raw_qp_param.set_mask |= MLX5_RAW_QP_RATE_LIMIT;
+		}
 		err = modify_raw_packet_qp(dev, qp, &raw_qp_param, tx_affinity);
 	} else {
 		err = mlx5_core_qp_modify(dev->mdev, op, optpar, context,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Paolo Bonzini @ 2016-10-31 11:45 UTC (permalink / raw)
  To: Lorenzo Stoakes, linux-mm
  Cc: Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, linux-kernel,
	linux-cris-kernel, linux-ia64, dri-devel, linux-rdma, kvm,
	linux-media, devel
In-Reply-To: <20161031100228.17917-3-lstoakes@gmail.com>



On 31/10/2016 11:02, Lorenzo Stoakes wrote:
> - *
> - * get_user_pages should be phased out in favor of
> - * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
> - * should use get_user_pages because it cannot pass
> - * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.

This comment should be preserved in some way.  In addition, removing
get_user_pages_locked() makes it harder (compared to a simple "git grep
-w") to identify callers that lack allow-retry functionality).  So I'm
not sure about the benefits of these patches.

If all callers were changed, then sure removing the _locked suffix would
be a good idea.

Paolo

^ permalink raw reply

* (unknown), 
From: Debra_Farmer/SSB/HIDOE-CJ/7EgCimzc+ygjh5cb8Nw @ 2016-10-31 12:51 UTC (permalink / raw)



I am Mrs. Gu Kailai and i intend to make a DONATION. Contact my personal E-mail Via: mrsgukailai-SgvXgqTD8kc@public.gmane.org for more details:
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Lorenzo Stoakes @ 2016-10-31 13:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-mm, Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins,
	Dave Hansen, Rik van Riel, Mel Gorman, Andrew Morton,
	linux-kernel, linux-cris-kernel, linux-ia64, dri-devel,
	linux-rdma, kvm, linux-media, devel
In-Reply-To: <cc508436-156e-eb4b-ae01-b44f33c2d692@redhat.com>

On Mon, Oct 31, 2016 at 12:45:36PM +0100, Paolo Bonzini wrote:
>
>
> On 31/10/2016 11:02, Lorenzo Stoakes wrote:
> > - *
> > - * get_user_pages should be phased out in favor of
> > - * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
> > - * should use get_user_pages because it cannot pass
> > - * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
>
> This comment should be preserved in some way.  In addition, removing

Could you clarify what you think should be retained?

The comment seems to me to be largely rendered redundant by this change -
get_user_pages() now offers identical behaviour, and of course the latter part
of the comment ('because it cannot pass FAULT_FLAG_ALLOW_RETRY') is rendered
incorrect by this change.

It could be replaced with a recommendation to make use of VM_FAULT_RETRY logic
if possible. Note that the line above retains the reference to
'get_user_pages_fast()' - 'See also get_user_pages_fast, for performance
critical applications.'

One important thing to note here is this is a comment on get_user_pages_remote()
for which it was already incorrect syntactically (I'm guessing once
get_user_pages_remote() was added it 'stole' get_user_pages()'s comment :)

> get_user_pages_locked() makes it harder (compared to a simple "git grep
> -w") to identify callers that lack allow-retry functionality).  So I'm
> not sure about the benefits of these patches.
>

That's a fair point, and certainly this cleanup series is less obviously helpful
than previous ones.

However, there are a few points in its favour:

1. get_user_pages_remote() was the mirror of get_user_pages() prior to adding an
   int *locked parameter to the former (necessary to allow for the unexport of
   __get_user_pages_unlocked()), differing only in task/mm being specified and
   FOLL_REMOTE being set. This patch series keeps these functions 'synchronised'
   in this respect.

2. There is currently only one caller of get_user_pages_locked() in
   mm/frame_vector.c which seems to suggest this function isn't widely
   used/known.

3. This change results in all slow-path get_user_pages*() functions having the
   ability to use VM_FAULT_RETRY logic rather than 'defaulting' to using
   get_user_pages() that doesn't let you do this even if you wanted to.

4. It's somewhat confusing/redundant having both get_user_pages_locked() and
   get_user_pages() functions which both require mmap_sem to be held (i.e. both
   are 'locked' versions.)

> If all callers were changed, then sure removing the _locked suffix would
> be a good idea.

It seems many callers cannot release mmap_sem so VM_FAULT_RETRY logic couldn't
happen anyway in this cases and in these cases we couldn't change the caller.

Overall, an alternative here might be to remove get_user_pages() and update
get_user_pages_locked() to add a 'vmas' parameter, however doing this renders
get_user_pages_unlocked() asymmetric as it would lack an vmas parameter (adding
one there would make no sense as VM_FAULT_RETRY logic invalidates VMAs) though
perhaps not such a big issue as it makes sense as to why this is the case.

get_user_pages_unlocked() definitely seems to be a useful helper and therefore
makes sense to retain.

Of course another alternative is to leave things be :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Laurence Oberman @ 2016-10-31 13:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Konrad Rzeszutek Wilk, Roger Pau Monné,
	linux-block, linux-scsi, linux-rdma, linux-nvme
In-Reply-To: <fe057bf0-6ede-fc9e-9a53-9d790c9782c2@sandisk.com>



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Jens Axboe" <axboe@fb.com>
> Cc: "Christoph Hellwig" <hch@lst.de>, "James Bottomley" <jejb@linux.vnet.ibm.com>, "Martin K. Petersen"
> <martin.petersen@oracle.com>, "Mike Snitzer" <snitzer@redhat.com>, "Doug Ledford" <dledford@redhat.com>, "Keith
> Busch" <keith.busch@intel.com>, "Ming Lei" <tom.leiming@gmail.com>, "Konrad Rzeszutek Wilk"
> <konrad.wilk@oracle.com>, "Roger Pau Monné" <roger.pau@citrix.com>, "Laurence Oberman" <loberman@redhat.com>,
> linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
> Sent: Friday, October 28, 2016 8:23:40 PM
> Subject: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
> 
> Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
> that became superfluous because of this change. Change
> blk_queue_stopped() tests into blk_mq_queue_stopped().
> 
> This patch fixes a race condition: using queue_flag_clear_unlocked()
> is not safe if any other function that manipulates the queue flags
> can be called concurrently, e.g. blk_cleanup_queue().
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/core.c | 16 ++--------------
>  1 file changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fe15d94..45dd237 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct
> gendisk *disk)
>  
>  void nvme_requeue_req(struct request *req)
>  {
> -	unsigned long flags;
> -
> -	blk_mq_requeue_request(req, false);
> -	spin_lock_irqsave(req->q->queue_lock, flags);
> -	if (!blk_queue_stopped(req->q))
> -		blk_mq_kick_requeue_list(req->q);
> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
> +	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
>  }
>  EXPORT_SYMBOL_GPL(nvme_requeue_req);
>  
> @@ -2078,13 +2072,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> -	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		spin_lock_irq(ns->queue->queue_lock);
> -		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
> -		spin_unlock_irq(ns->queue->queue_lock);
> -
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
>  		blk_mq_quiesce_queue(ns->queue);
> -	}
>  	mutex_unlock(&ctrl->namespaces_mutex);
>  }
>  EXPORT_SYMBOL_GPL(nvme_stop_queues);
> @@ -2095,7 +2084,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
>  		blk_mq_start_stopped_hw_queues(ns->queue, true);
>  		blk_mq_kick_requeue_list(ns->queue);
>  	}
> --
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Bart

Thanks for all this work.

Applied all 14 patches, also corrected the part of the xen-blkfront.c blkif_recover patch in patchv5-5/14.

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..60fff99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
                 BUG_ON(req->nr_phys_segments > segs);
                 blk_mq_requeue_request(req);
         }
+        blk_mq_start_stopped_hw_queues(infrq, true);                    *** Corrected
         blk_mq_kick_requeue_list(infrq);
 
         while ((bio = bio_list_pop(&infbio_list)) != NULL) {

Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx5 (100Gbit) with max_sectors_kb set to 1024, 2048, 4096 and 8196
Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx4 (56Gbit)  with max_sectors_kb set to 1024, 2048, 4096 and 8196
Reset the SRP hosts multiple times with multipath set to no_path_retry queue
Ran basic NVME read/write testing with no hot plug disconnects on multiple block sizes

All tests passed.

For the series:
Tested-by: Laurence Oberman <loberman@redhat.com>

^ permalink raw reply related

* Re: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-31 13:59 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Konrad Rzeszutek Wilk, Roger Pau Monné,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <540193784.5466628.1477921998345.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On 10/31/2016 06:53 AM, Laurence Oberman wrote:
> Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx5 (100Gbit) with max_sectors_kb set to 1024, 2048, 4096 and 8196
> Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx4 (56Gbit)  with max_sectors_kb set to 1024, 2048, 4096 and 8196
> Reset the SRP hosts multiple times with multipath set to no_path_retry queue
> Ran basic NVME read/write testing with no hot plug disconnects on multiple block sizes
>
> All tests passed.
>
> For the series:
> Tested-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hello Laurence,

Thanks for having tested this version of this patch series again so quickly!

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Bart Van Assche @ 2016-10-31 15:10 UTC (permalink / raw)
  To: loberman@redhat.com
  Cc: linux-block@vger.kernel.org, tom.leiming@gmail.com,
	linux-rdma@vger.kernel.org, snitzer@redhat.com, hch@lst.de,
	martin.petersen@oracle.com, konrad.wilk@oracle.com,
	roger.pau@citrix.com, axboe@fb.com, linux-scsi@vger.kernel.org,
	linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
	keith.busch@intel.com, dledford@redhat.com
In-Reply-To: <540193784.5466628.1477921998345.JavaMail.zimbra@redhat.com>

On Mon, 2016-10-31 at 09:53 -0400, Laurence Oberman wrote:
> Applied all 14 patches, also corrected the part of the xen-blkfront.c 
> blkif_recover patch in patchv5-5/14.
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-
> blkfront.c
> index 9908597..60fff99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info
> *info)
>                  BUG_ON(req->nr_phys_segments > segs);
>                  blk_mq_requeue_request(req);
>          }
> +        blk_mq_start_stopped_hw_queues(infrq,
> true);                    *** Corrected
>          blk_mq_kick_requeue_list(infrq);
>  
>          while ((bio = bio_list_pop(&infbio_list)) != NULL) {

Hello Laurence,

Sorry for the build failure. The way you changed xen-blkfront is indeed
what I intended. Apparently I forgot to enable Xen in my kernel config
...

Bart.

^ permalink raw reply

* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Or Gerlitz @ 2016-10-31 15:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	skc-YOWKrPYUwWM, Weiny, Ira, Jason Gunthorpe, John Fleck,
	Leon Romanovsky, Liran Liss, Matan Barak, Tzahi Oved
In-Reply-To: <alpine.DEB.2.20.1610301848560.15769-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>

On Mon, Oct 31, 2016 at 1:49 AM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:

> If you were invited to the KS or LPC then you can attend the RDMA
> workshop.

AFAIK LPC is by registration, not by invitation, so all LPC attendees
can come, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RDMA developer gatherings around Kernel Summit and Linux Plumbers in Santa Fe
From: Christoph Lameter @ 2016-10-31 15:33 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	skc-YOWKrPYUwWM, Weiny, Ira, Jason Gunthorpe, John Fleck,
	Leon Romanovsky, Liran Liss, Matan Barak, Tzahi Oved
In-Reply-To: <CAJ3xEMh-Q_6tRV674vhNE5o4DYiECbpSEGL_au-jjTM4bUt0Lw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


On Mon, 31 Oct 2016, Or Gerlitz wrote:

> On Mon, Oct 31, 2016 at 1:49 AM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:
>
> > If you were invited to the KS or LPC then you can attend the RDMA
> > workshop.
>
> AFAIK LPC is by registration, not by invitation, so all LPC attendees
> can come, right?
>

Correct. There was a limit on the number of registrations and a number of
additional spots were available with a special inviate.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: iscsi_trx going into D state
From: Robert LeBlanc @ 2016-10-31 16:34 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Zhu Lingshan, linux-rdma, linux-scsi-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477780190.22703.47.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>

Nicholas,

Thanks for following up on this. We have been chasing other bugs in
our provisioning and as such has reduced our load on the boxes. We are
hoping to get that all straightened out this week and do some more
testing. So far we have not had any iSCSI in D state since the patch,
be we haven't been able to test it well either. We will keep you
updated.

Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Sat, Oct 29, 2016 at 4:29 PM, Nicholas A. Bellinger
<nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org> wrote:
> Hi Robert,
>
> On Wed, 2016-10-19 at 10:41 -0600, Robert LeBlanc wrote:
>> Nicholas,
>>
>> I didn't have high hopes for the patch because we were not seeing
>> TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it
>> seemed to help regardless. Our clients finally OOMed from the hung
>> sessions, so we are having to reboot them and we will do some more
>> testing. We haven't put the updated kernel on our clients yet. Our
>> clients have iSCSI root disks so I'm not sure if we can get a vmcore
>> on those, but we will do what we can to get you a vmcore from the
>> target if it happens again.
>>
>
> Just checking in to see if you've observed further issues with
> iser-target ports, and/or able to generate a crashdump with v4.4.y..?
>
>> As far as our configuration: It is a superMicro box with 6 SAMSUNG
>> MZ7LM3T8HCJM-00005 SSDs. Two are for root and four are in mdadm
>> RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the
>> RAID-10 for checksum and snapshots only and we export ZVols to the
>> clients (one or more per VM on the client). We do not persist the
>> export info (targetcli saveconfig), but regenerate it from scripts.
>> The client receives two or more of these exports and puts them in a
>> RAID-1 device. The exports are served by iSER one one port and also by
>> normal iSCSI on a different port for compatibility, but not normally
>> used. If you need more info about the config, please let me know. It
>> was kind of a vague request so I'm not sure what exactly is important
>> to you.
>
> Thanks for the extra details of your hardware + user-space
> configuration.
>
>> Thanks for helping us with this,
>> Robert LeBlanc
>>
>> When we have problems, we usually see this in the logs:
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on
>> Network Portal 0.0.0.0:3260
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48.
>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed.
>>
>> I found some backtraces in the logs, not sure if this is helpful, this
>> is before your patch (your patch booted at Oct 18 10:36:59):
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this
>> GP) idle=b59/140000000000001/0 softirq=535/535 fqs=30992
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550
>> c=1549 q=0)
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5:
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2   R  running
>> task        0 17967      2 0x00000008
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq
>> isert_cq_work [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: ffff883f4c0dca80
>> 00000000af8ca7a4 ffff883f7fb43da8 ffffffff810ac83f
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000005
>> ffffffff81adb680 ffff883f7fb43dc0 ffffffff810af179
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000006
>> ffff883f7fb43df0 ffffffff810e1c10 ffff883f7fb57b80
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac83f>]
>> sched_show_task+0xaf/0x110
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810af179>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e1c10>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e6040>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810afd51>] ?
>> account_system_time+0x81/0x110
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9a0>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810eb4d9>]
>> update_process_times+0x39/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa755>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9dd>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec0c2>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec5a8>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8172343d>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff817216f7>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d70fe>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d74bc>]
>> vprintk_emit+0x2fc/0x500
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d783f>]
>> vprintk_default+0x1f/0x30
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81174c2a>] printk+0x5d/0x74
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814bc351>]
>> transport_lookup_cmd_lun+0x1d1/0x200
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814edcf0>]
>> iscsit_setup_scsi_cmd+0x230/0x540
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0890bf3>]
>> isert_rx_do_work+0x3f3/0x7f0 [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0891174>]
>> isert_cq_work+0x184/0x770 [ib_isert]
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109734f>]
>> process_one_work+0x14f/0x400
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097bc4>]
>> worker_thread+0x114/0x470
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8171c55a>] ?
>> __schedule+0x34a/0x7f0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097ab0>] ?
>> rescuer_thread+0x310/0x310
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d708>] kthread+0xd8/0xf0
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>> kthread_park+0x60/0x60
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81720c8f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ?
>> kthread_park+0x60/0x60
>>
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #01128-...: (5999 ticks this
>> GP) idle=2f9/140000000000001/0 softirq=457/457 fqs=4830
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=3546
>> c=3545 q=0)
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Task dump for CPU 28:
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: iscsi_np        R  running
>> task        0 16597      2 0x0000000c
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: ffff887f40350000
>> 00000000b98a67bb ffff887f7f503da8 ffffffff810ac8ff
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001c
>> ffffffff81adb680 ffff887f7f503dc0 ffffffff810af239
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001d
>> ffff887f7f503df0 ffffffff810e1cd0 ffff887f7f517b80
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>> sched_show_task+0xaf/0x110
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>> account_system_time+0x81/0x110
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>> update_process_times+0x39/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d757c>]
>> vprintk_emit+0x2fc/0x500
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d78ff>]
>> vprintk_default+0x1f/0x30
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81174dde>] printk+0x5d/0x74
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e71ad>]
>> iscsi_target_locate_portal+0x62d/0x6f0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e5100>]
>> iscsi_target_login_thread+0x6f0/0xfc0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e4a10>] ?
>> iscsi_target_login_sess_out+0x250/0x250
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>>
>> I don't think this one is related, but it happened a couple of times:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: INFO: rcu_sched
>> self-detected stall on CPU
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #01119-...: (5999 ticks this
>> GP) idle=727/140000000000001/0 softirq=1346/1346 fqs=4990
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=4295
>> c=4294 q=0)
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Task dump for CPU 19:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: kworker/19:1    R  running
>> task        0   301      2 0x00000008
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Workqueue:
>> events_power_efficient fb_flashcursor
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: ffff883f6009ca80
>> 00000000010a7cdd ffff883f7fcc3da8 ffffffff810ac8ff
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000013
>> ffffffff81adb680 ffff883f7fcc3dc0 ffffffff810af239
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000014
>> ffff883f7fcc3df0 ffffffff810e1cd0 ffff883f7fcd7b80
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Call Trace:
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <IRQ>  [<ffffffff810ac8ff>]
>> sched_show_task+0xaf/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810af239>]
>> dump_cpu_task+0x39/0x40
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>]
>> rcu_dump_cpu_stacks+0x80/0xb0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e6100>]
>> rcu_check_callbacks+0x540/0x820
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ?
>> account_system_time+0x81/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ?
>> tick_sched_do_timer+0x50/0x50
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810eb599>]
>> update_process_times+0x39/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810fa815>]
>> tick_sched_handle.isra.17+0x25/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa9d>]
>> tick_sched_timer+0x3d/0x70
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec182>]
>> __hrtimer_run_queues+0x102/0x290
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec668>]
>> hrtimer_interrupt+0xa8/0x1a0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81052c65>]
>> local_apic_timer_interrupt+0x35/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81723cbd>]
>> smp_apic_timer_interrupt+0x3d/0x50
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81721f77>]
>> apic_timer_interrupt+0x87/0x90
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <EOI>  [<ffffffff810d71be>]
>> ? console_unlock+0x41e/0x4e0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff813866ad>]
>> fb_flashcursor+0x5d/0x140
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8138bc00>] ?
>> bit_clear+0x110/0x110
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109740f>]
>> process_one_work+0x14f/0x400
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097c84>]
>> worker_thread+0x114/0x470
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8171cdda>] ?
>> __schedule+0x34a/0x7f0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097b70>] ?
>> rescuer_thread+0x310/0x310
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8172150f>]
>> ret_from_fork+0x3f/0x70
>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ?
>> kthread_park+0x60/0x60
>
> RCU self-detected schedule stalls typically mean some code is
> monopolizing execution on a specific CPU for an extended period of time
> (eg: endless loop), preventing normal RCU grace-period callbacks from
> running in a timely manner.
>
> It's hard to tell without more log context and/or crashdump what was
> going on here.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Paolo Bonzini @ 2016-10-31 17:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins,
	Dave Hansen, Rik van Riel, Mel Gorman, Andrew Morton,
	linux-kernel, linux-cris-kernel, linux-ia64, dri-devel,
	linux-rdma, kvm, linux-media, devel
In-Reply-To: <20161031134849.GA13609@lucifer>



On 31/10/2016 14:48, Lorenzo Stoakes wrote:
> On Mon, Oct 31, 2016 at 12:45:36PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 31/10/2016 11:02, Lorenzo Stoakes wrote:
>>> - *
>>> - * get_user_pages should be phased out in favor of
>>> - * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
>>> - * should use get_user_pages because it cannot pass
>>> - * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
>>
>> This comment should be preserved in some way.  In addition, removing
> 
> Could you clarify what you think should be retained?
> 
> The comment seems to me to be largely rendered redundant by this change -
> get_user_pages() now offers identical behaviour, and of course the latter part
> of the comment ('because it cannot pass FAULT_FLAG_ALLOW_RETRY') is rendered
> incorrect by this change.
> 
> It could be replaced with a recommendation to make use of VM_FAULT_RETRY logic
> if possible.

Yes, exactly.  locked == NULL should be avoided whenever mmap_sem can 
be dropped, and the comment indirectly said so.  Now most of those cases
actually are those where you can just call get_user_pages_unlocked.

>> get_user_pages_locked() makes it harder (compared to a simple "git grep
>> -w") to identify callers that lack allow-retry functionality).  So I'm
>> not sure about the benefits of these patches.
> 
> That's a fair point, and certainly this cleanup series is less obviously helpful
> than previous ones.
> 
> However, there are a few points in its favour:
> 
> 1. get_user_pages_remote() was the mirror of get_user_pages() prior to adding an
>    int *locked parameter to the former (necessary to allow for the unexport of
>    __get_user_pages_unlocked()), differing only in task/mm being specified and
>    FOLL_REMOTE being set. This patch series keeps these functions 'synchronised'
>    in this respect.
> 
> 2. There is currently only one caller of get_user_pages_locked() in
>    mm/frame_vector.c which seems to suggest this function isn't widely
>    used/known.

Or not widely necessary. :)

> 3. This change results in all slow-path get_user_pages*() functions having the
>    ability to use VM_FAULT_RETRY logic rather than 'defaulting' to using
>    get_user_pages() that doesn't let you do this even if you wanted to.

This is only true if someone does the work though.  From a quick look 
at your series, the following files can be trivially changed to use 
get_user_pages_unlocked:

- drivers/gpu/drm/via/via_dmablit.c
- drivers/platform/goldfish/goldfish_pipe.c
- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
- drivers/rapidio/devices/rio_mport_cdev.c
- drivers/virt/fsl_hypervisor.c

Also, videobuf_dma_init_user could be changed to use retry by adding a 
*locked argument to videobuf_dma_init_user_locked, prototype patch 
after my signature.

Everything else is probably best kept using get_user_pages.

> 4. It's somewhat confusing/redundant having both get_user_pages_locked() and
>    get_user_pages() functions which both require mmap_sem to be held (i.e. both
>    are 'locked' versions.)
> 
>> If all callers were changed, then sure removing the _locked suffix would
>> be a good idea.
> 
> It seems many callers cannot release mmap_sem so VM_FAULT_RETRY logic couldn't
> happen anyway in this cases and in these cases we couldn't change the caller.

But then get_user_pages_locked remains a less-common case, so perhaps 
it's a good thing to give it a longer name!

> Overall, an alternative here might be to remove get_user_pages() and update
> get_user_pages_locked() to add a 'vmas' parameter, however doing this renders
> get_user_pages_unlocked() asymmetric as it would lack an vmas parameter (adding
> one there would make no sense as VM_FAULT_RETRY logic invalidates VMAs) though
> perhaps not such a big issue as it makes sense as to why this is the case.

Adding the 'vmas' parameter to get_user_pages_locked would make little 
sense.  Since VM_FAULT_RETRY invalidates it and g_u_p_locked can and 
does retry, it would generally not be useful.

So I think we have the right API now:

- do not have lock?  Use get_user_pages_unlocked, get retry for free,
no need to handle  mmap_sem and the locked argument; cannot get back vmas.

- have and cannot drop lock?  User get_user_pages, no need to pass NULL 
for the locked argument; can get back vmas.

- have but can drop lock (rare case)?  Use get_user_pages_locked, 
cannot get back vams.

Paolo

> get_user_pages_unlocked() definitely seems to be a useful helper and therefore
> makes sense to retain.

> Of course another alternative is to leave things be :)
> 

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 1db0af6c7f94..dae4eb8e9d5b 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -152,7 +152,8 @@ static void videobuf_dma_init(struct videobuf_dmabuf *dma)
 }
 
 static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
-			int direction, unsigned long data, unsigned long size)
+			int direction, unsigned long data, unsigned long size,
+			int *locked)
 {
 	unsigned long first, last;
 	int err, rw = 0;
@@ -185,8 +186,17 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
-			     flags, dma->pages, NULL);
+	if (locked && !*locked) {
+		down_read(&current->mm->mmap_sem);
+		*locked = 1;
+	}
+
+	/*
+	 * If the caller cannot have mmap_sem dropped, it passes locked == NULL
+	 * so get_user_pages_locked will not release it.
+	 */
+	err = get_user_pages_locked(data & PAGE_MASK, dma->nr_pages,
+				    flags, dma->pages, locked);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
@@ -200,10 +210,11 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
 			   unsigned long data, unsigned long size)
 {
 	int ret;
+	int locked = 0;
 
-	down_read(&current->mm->mmap_sem);
-	ret = videobuf_dma_init_user_locked(dma, direction, data, size);
-	up_read(&current->mm->mmap_sem);
+	ret = videobuf_dma_init_user_locked(dma, direction, data, size, &locked);
+	if (locked)
+		up_read(&current->mm->mmap_sem);
 
 	return ret;
 }
@@ -540,7 +551,7 @@ static int __videobuf_iolock(struct videobuf_queue *q,
 
 			err = videobuf_dma_init_user_locked(&mem->dma,
 						      DMA_FROM_DEVICE,
-						      vb->baddr, vb->bsize);
+						      vb->baddr, vb->bsize, NULL);
 			if (0 != err)
 				return err;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH 2/2] mm: remove get_user_pages_locked()
From: Lorenzo Stoakes @ 2016-10-31 19:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-mm, Linus Torvalds, Michal Hocko, Jan Kara, Hugh Dickins,
	Dave Hansen, Rik van Riel, Mel Gorman, Andrew Morton,
	linux-kernel, linux-cris-kernel, linux-ia64, dri-devel,
	linux-rdma, kvm, linux-media, devel
In-Reply-To: <ddbe34d0-5d29-abce-1627-f464635bbfe6@redhat.com>

On Mon, Oct 31, 2016 at 06:55:33PM +0100, Paolo Bonzini wrote:
> > 2. There is currently only one caller of get_user_pages_locked() in
> >    mm/frame_vector.c which seems to suggest this function isn't widely
> >    used/known.
>
> Or not widely necessary. :)

Well, quite :)
>
> > 3. This change results in all slow-path get_user_pages*() functions having the
> >    ability to use VM_FAULT_RETRY logic rather than 'defaulting' to using
> >    get_user_pages() that doesn't let you do this even if you wanted to.
>
> This is only true if someone does the work though.  From a quick look
> at your series, the following files can be trivially changed to use
> get_user_pages_unlocked:
>
> - drivers/gpu/drm/via/via_dmablit.c
> - drivers/platform/goldfish/goldfish_pipe.c
> - drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
> - drivers/rapidio/devices/rio_mport_cdev.c
> - drivers/virt/fsl_hypervisor.c
>

Ah indeed, I rather glossed through the callers and noticed that at least some
held locks long enough or were called with a lock held sufficient that I wasn't
sure it could be released.

> Also, videobuf_dma_init_user could be changed to use retry by adding a
> *locked argument to videobuf_dma_init_user_locked, prototype patch
> after my signature.
>

Very nice!

> Everything else is probably best kept using get_user_pages.
>
> > 4. It's somewhat confusing/redundant having both get_user_pages_locked() and
> >    get_user_pages() functions which both require mmap_sem to be held (i.e. both
> >    are 'locked' versions.)
> >
> >> If all callers were changed, then sure removing the _locked suffix would
> >> be a good idea.
> >
> > It seems many callers cannot release mmap_sem so VM_FAULT_RETRY logic couldn't
> > happen anyway in this cases and in these cases we couldn't change the caller.
>
> But then get_user_pages_locked remains a less-common case, so perhaps
> it's a good thing to give it a longer name!

My (somewhat minor) concern was that there would be confusion due to the
existence of the triumvirate of g_u_p()/g_u_p_unlocked()/g_u_p_locked(), however
the comments do a decent enough job of explaining the situation.

>
> > Overall, an alternative here might be to remove get_user_pages() and update
> > get_user_pages_locked() to add a 'vmas' parameter, however doing this renders
> > get_user_pages_unlocked() asymmetric as it would lack an vmas parameter (adding
> > one there would make no sense as VM_FAULT_RETRY logic invalidates VMAs) though
> > perhaps not such a big issue as it makes sense as to why this is the case.
>
> Adding the 'vmas' parameter to get_user_pages_locked would make little
> sense.  Since VM_FAULT_RETRY invalidates it and g_u_p_locked can and
> does retry, it would generally not be useful.

I meant only in the case where we'd remove get_user_pages() and instead be left
with get_user_pages_[un]locked() only, meaning we'd have to add some means of
retrieving vmas if locked was set NULL, of course in cases where locked is not
NULL it makes no sense to add it.

>
> So I think we have the right API now:
>
> - do not have lock?  Use get_user_pages_unlocked, get retry for free,
> no need to handle  mmap_sem and the locked argument; cannot get back vmas.
>
> - have and cannot drop lock?  User get_user_pages, no need to pass NULL
> for the locked argument; can get back vmas.
>
> - have but can drop lock (rare case)?  Use get_user_pages_locked,
> cannot get back vams.

Yeah I think this is sane as it is actually, this patch set was a lot less
convincing of a cleanup than prior ones and overall it seems we are better off
with the existing API.

I wonder whether a better patch series to come out of this would be to find
cases where the lock could be dropped (i.e. the ones you mention above) and to
switch to using get_user_pages_[un]locked() where it makes sense to.

I am happy to look into these cases (though of course I must leave your
suggested patch here to you :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
From: Matan Barak @ 2016-10-31 22:58 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A47BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>

On Sun, Oct 30, 2016 at 9:28 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I found this patch very hard to follow.  This was in part due to the output of the patch command itself, but also because there lacked sufficient documentation on what the new data structures were for and the terms being used.  As a result, I had to bounce around the patch to figure things out, adding comments as I went along, until I finally just gave up trying to read it.

Actually, there are some helpful slides in the OFVWG presentations. I
guess it would be best to clarify the model in the commit message.

>
>> The new ioctl infrastructure supports driver specific objects.
>> Each such object type has a free function, allocation size and an
>
> You can replace the allocation size with an alloc function, to pair with the free call.  Then the object can be initialized by the user.
>

I had thought about that, but the user could initialize its part of
the object in the function handler. It can't allocate the object as we
need it in order to allocate an IDR entry and co. The assumption here
is that the "unlock" stage can't fail.

>> order of destruction. This information is embedded in the same
>> table describing the various action allowed on the object, similarly
>> to object oriented programming.
>>
>> When a ucontext is created, a new list is created in this ib_ucontext.
>> This list contains all objects created under this ib_ucontext.
>> When a ib_ucontext is destroyed, we traverse this list several time
>> destroying the various objects by the order mentioned in the object
>> type description. If few object types have the same destruction order,
>> they are destroyed in an order opposite to their creation order.
>
> Could we simply walk the list backwards, destroying all objects with a reference count of 1 - repeat if necessary?  Basically avoid complex rules for this.
>

That's problematic in the MW case. A MW could be disassociated from
its MR by a remote peer. The kernel can't follow that.

> In fact, it would be great if we could just cleanup the list in the reverse order that items were created.  Maybe this requires supporting a pre-cleanup handler, so that the driver can pluck items out of the list that may need to be destroyed out of order.
>

So that's essentially one layer of ordering. Why do you consider a
driver iterating over all objects simpler than this model?

>> Adding an object is done in two parts.
>> First, an object is allocated and added to IDR/fd table. Then, the
>> command's handlers (in downstream patches) could work on this object
>> and fill in its required details.
>> After a successful command, ib_uverbs_uobject_enable is called and
>> this user objects becomes ucontext visible.
>
> If you have a way to mark that an object is used for exclusive access, you may be able to use that instead of introducing a new variable.  (I.e. acquire the object's write lock).  I think we want to make an effort to minimize the size of the kernel structure needed to track every user space object (within reason).
>

I didn't really follow. A command attribute states the nature of the
locking (for example, in MODIFY_QP the QP could be exclusively locked,
but in QUERY_QP it's only locked for reading). I don't want to really
grab a lock, as if I were I could face a dead-lock (user-space could
pass parameters in a colliding order), It could be solved by sorting
the handles, but that would degrade performance without a good reasob.

>> Removing an uboject is done by calling ib_uverbs_uobject_remove.
>>
>> We should make sure IDR (per-device) and list (per-ucontext) could
>> be accessed concurrently without corrupting them.
>>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>
> As a general comment, I do have concerns that the resulting generalized parsing of everything will negatively impact performance for operations that do have to transition into the kernel.  Not all devices offload all operations to user space.  Plus the resulting code is extremely difficult to read and non-trivial to use.  It's equivalent to reading C++ code that has 4 layers of inheritance with overrides to basic operators...

There are two parts here. I think the handlers themselves are simpler,
easier to read and less error-prone. They contain less code
duplications. The macro based define language explicitly declare all
attributes, their types, size, etc.
The model here is a bit more complex as we want to achieve both code
resue and add/override of new types/actions/attributes.


>
> Pre and post operators per command that can do straightforward validation seem like a better option.
>
>

I think that would duplicate a lot of code and will be more
error-prone than one infrastrucutre that automates all that work for
you.

>>  drivers/infiniband/core/Makefile      |   3 +-
>>  drivers/infiniband/core/device.c      |   1 +
>>  drivers/infiniband/core/rdma_core.c   | 489
>> ++++++++++++++++++++++++++++++++++
>>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>>  drivers/infiniband/core/uverbs.h      |   1 +
>>  drivers/infiniband/core/uverbs_main.c |   2 +-
>>  include/rdma/ib_verbs.h               |  28 +-
>>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>>  8 files changed, 789 insertions(+), 5 deletions(-)
>>  create mode 100644 drivers/infiniband/core/rdma_core.c
>>  create mode 100644 drivers/infiniband/core/rdma_core.h
>>  create mode 100644 include/rdma/uverbs_ioctl.h
>>
>> diff --git a/drivers/infiniband/core/Makefile
>> b/drivers/infiniband/core/Makefile
>> index edaae9f..1819623 100644
>> --- a/drivers/infiniband/core/Makefile
>> +++ b/drivers/infiniband/core/Makefile
>> @@ -28,4 +28,5 @@ ib_umad-y :=                        user_mad.o
>>
>>  ib_ucm-y :=                  ucm.o
>>
>> -ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o
>> +ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o \
>> +                             rdma_core.o
>> diff --git a/drivers/infiniband/core/device.c
>> b/drivers/infiniband/core/device.c
>> index c3b68f5..43994b1 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>>       spin_lock_init(&device->client_data_lock);
>>       INIT_LIST_HEAD(&device->client_data_list);
>>       INIT_LIST_HEAD(&device->port_list);
>> +     INIT_LIST_HEAD(&device->type_list);
>>
>>       return device;
>>  }
>> diff --git a/drivers/infiniband/core/rdma_core.c
>> b/drivers/infiniband/core/rdma_core.c
>> new file mode 100644
>> index 0000000..337abc2
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.c
>> @@ -0,0 +1,489 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <linux/file.h>
>> +#include <linux/anon_inodes.h>
>> +#include <rdma/ib_verbs.h>
>> +#include "uverbs.h"
>> +#include "rdma_core.h"
>> +#include <rdma/uverbs_ioctl.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type)
>> +{
>> +     const struct uverbs_types_group *groups = ibdev->types_group;
>> +     const struct uverbs_types *types;
>> +     int ret = groups->dist(&type, groups->priv);
>> +
>> +     if (ret >= groups->num_groups)
>> +             return NULL;
>> +
>> +     types = groups->type_groups[ret];
>> +
>> +     if (type >= types->num_types)
>> +             return NULL;
>> +
>> +     return types->types[type];
>> +}
>> +
>> +static int uverbs_lock_object(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_READ)
>> +             return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>> +
>> +     /* lock is either WRITE or DESTROY - should be exclusive */
>> +     return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>
> This function could take the lock type directly (read or write), versus inferring it based on some other access type.
>

We can, but since we use these enums in the attribute specifications,
I thought it could be more convinient.

>> +}
>> +
>> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
>> *context)
>> +{
>> +     struct ib_uobject *uobj;
>> +
>> +     rcu_read_lock();
>> +     uobj = idr_find(&context->device->idr, id);
>> +     if (uobj && uobj->live) {
>> +             if (uobj->context != context)
>> +                     uobj = NULL;
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_ucontext_lock {
>> +     struct kref  ref;
>> +     /* locking the uobjects_list */
>> +     struct mutex lock;
>> +};
>> +
>> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
>> +{
>> +     mutex_init(&lock->lock);
>> +     kref_init(&lock->ref);
>> +}
>> +
>> +static void release_uobjects_list_lock(struct kref *ref)
>> +{
>> +     struct ib_ucontext_lock *lock = container_of(ref,
>> +                                                  struct ib_ucontext_lock,
>> +                                                  ref);
>> +
>> +     kfree(lock);
>> +}
>> +
>> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
>> +                   struct ib_ucontext *context)
>> +{
>> +     init_rwsem(&uobj->usecnt);
>> +     uobj->user_handle = user_handle;
>> +     uobj->context     = context;
>> +     uobj->live        = 0;
>> +}
>> +
>> +static int add_uobj(struct ib_uobject *uobj)
>> +{
>> +     int ret;
>> +
>> +     idr_preload(GFP_KERNEL);
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +
>> +     ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
>> GFP_NOWAIT);
>> +     if (ret >= 0)
>> +             uobj->id = ret;
>> +
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +     idr_preload_end();
>> +
>> +     return ret < 0 ? ret : 0;
>> +}
>> +
>> +static void remove_uobj(struct ib_uobject *uobj)
>> +{
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +     idr_remove(&uobj->context->device->idr, uobj->id);
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +}
>> +
>> +static void put_uobj(struct ib_uobject *uobj)
>> +{
>> +     kfree_rcu(uobj, rcu);
>> +}
>> +
>> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext
>> *ucontext,
>> +                                                const struct
>> uverbs_type_alloc_action *type,
>> +                                                u32 idr,
>> +                                                enum uverbs_idr_access access)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     rcu_read_lock();
>> +     uobj = get_uobj(idr, ucontext);
>> +     if (!uobj)
>> +             goto free;
>> +
>> +     if (uobj->type != type) {
>> +             uobj = NULL;
>> +             goto free;
>> +     }
>> +
>> +     ret = uverbs_lock_object(uobj, access);
>> +     if (ret)
>> +             uobj = ERR_PTR(ret);
>> +free:
>> +     rcu_read_unlock();
>> +     return uobj;
>> +
>> +     return NULL;
>> +}
>> +
>> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
>> +                              const struct uverbs_type_alloc_action
>> *uobject_type)
>> +{
>> +     uobject->type = uobject_type;
>> +     return add_uobj(uobject);
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             /* lock idr */
>
> Command to lock idr, but no lock is obtained.
>

ib_uverbs_uobject_add calls add_uobj which locks the IDR.

>> +             ret = ib_uverbs_uobject_add(uobj, type);
>> +             if (ret) {
>> +                     kfree(uobj);
>> +                     return ERR_PTR(ret);
>> +             }
>> +
>> +     } else {
>> +             uobj = get_uobject_from_context(ucontext, type, idr,
>> +                                             access);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOENT);
>> +     }
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             int _fd;
>> +             struct ib_uobject *uobj = NULL;
>> +             struct file *filp;
>> +
>> +             _fd = get_unused_fd_flags(O_CLOEXEC);
>> +             if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
>> ib_uobject)))
>> +                     return ERR_PTR(_fd);
>> +
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             filp = anon_inode_getfile(type->fd.name, type->fd.fops,
>> +                                       uobj + 1, type->fd.flags);
>> +             if (IS_ERR(filp)) {
>> +                     put_unused_fd(_fd);
>> +                     kfree(uobj);
>> +                     return (void *)filp;
>> +             }
>> +
>> +             uobj->type = type;
>> +             uobj->id = _fd;
>> +             uobj->object = filp;
>> +
>> +             return uobj;
>> +     } else if (access == UVERBS_IDR_ACCESS_READ) {
>> +             struct file *f = fget(fd);
>> +             struct ib_uobject *uobject;
>> +
>> +             if (!f)
>> +                     return ERR_PTR(-EBADF);
>> +
>> +             uobject = f->private_data - sizeof(struct ib_uobject);
>> +             if (f->f_op != type->fd.fops ||
>> +                 !uobject->live) {
>> +                     fput(f);
>> +                     return ERR_PTR(-EBADF);
>> +             }
>> +
>> +             /*
>> +              * No need to protect it with a ref count, as fget
>> increases
>> +              * f_count.
>> +              */
>> +             return uobject;
>> +     } else {
>> +             return ERR_PTR(-EOPNOTSUPP);
>> +     }
>> +}
>> +
>> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
>> +{
>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_add(&uobject->list, &uobject->context->uobjects);
>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>
> Why not just insert the object into the list on creation?
>
>> +     uobject->live = 1;
>
> See my comments above on removing the live field.
>

Seems that the list could suffice, but I'll look into that.

>> +}
>> +
>> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool
>> lock)
>> +{
>> +     /*
>> +      * Calling remove requires exclusive access, so it's not possible
>> +      * another thread will use our object.
>> +      */
>> +     uobject->live = 0;
>> +     uobject->type->free_fn(uobject->type, uobject);
>> +     if (lock)
>> +             mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_del(&uobject->list);
>> +     if (lock)
>> +             mutex_unlock(&uobject->context->uobjects_lock->lock);
>> +     remove_uobj(uobject);
>> +     put_uobj(uobject);
>> +}
>> +
>> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access,
>> +                           bool success)
>> +{
>> +     switch (access) {
>> +     case UVERBS_IDR_ACCESS_READ:
>> +             up_read(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_NEW:
>> +             if (success) {
>> +                     ib_uverbs_uobject_enable(uobj);
>> +             } else {
>> +                     remove_uobj(uobj);
>> +                     put_uobj(uobj);
>> +             }
>> +             break;
>> +     case UVERBS_IDR_ACCESS_WRITE:
>> +             up_write(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_DESTROY:
>> +             if (success)
>> +                     ib_uverbs_uobject_remove(uobj, true);
>> +             else
>> +                     up_write(&uobj->usecnt);
>> +             break;
>> +     }
>> +}
>> +
>> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
>> +                          enum uverbs_idr_access access,
>> +                          bool success)
>> +{
>> +     struct file *filp = uobj->object;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             if (success) {
>> +                     kref_get(&uobj->context->ufile->ref);
>> +                     uobj->uobjects_lock = uobj->context->uobjects_lock;
>> +                     kref_get(&uobj->uobjects_lock->ref);
>> +                     ib_uverbs_uobject_enable(uobj);
>> +                     fd_install(uobj->id, uobj->object);
>
> I don't get this.  The function is unlocking something, but there are calls to get krefs?
>

Before invoking the user's callback, we're first locking all objects
and afterwards we're unlocking them. When we need to create a new
object, the lock becomes object creation and the unlock could become
(assuming the user's callback succeeded) enabling this new object.
When you add a new object (or fd in this case), we take a reference
count to both the uverbs_file and the locking context.

>> +             } else {
>> +                     fput(uobj->object);
>> +                     put_unused_fd(uobj->id);
>> +                     kfree(uobj);
>> +             }
>> +     } else {
>> +             fput(filp);
>> +     }
>> +}
>> +
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success)
>> +{
>> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +             uverbs_unlock_idr(uobj, access, success);
>> +     else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
>> +             uverbs_unlock_fd(uobj, access, success);
>> +     else
>> +             WARN_ON(true);
>> +}
>> +
>> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
>> +{
>> +     /*
>> +      * user should release the uobject in the release
>> +      * callback.
>> +      */
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             uobject->type->free_fn(uobject->type, uobject);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +}
>> +
>> +void ib_uverbs_close_fd(struct file *f)
>> +{
>> +     struct ib_uobject *uobject = f->private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     mutex_lock(&uobject->uobjects_lock->lock);
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +     mutex_unlock(&uobject->uobjects_lock->lock);
>> +     kref_put(&uobject->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +void ib_uverbs_cleanup_fd(void *private_data)
>> +{
>> +     struct ib_uboject *uobject = private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     kfree(uobject);
>> +}
>> +
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success)
>> +{
>> +     unsigned int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             struct uverbs_attr_array *attr_spec_array = &attr_array[i];
>> +             const struct uverbs_attr_group_spec *group_spec =
>> +                     spec->attr_groups[i];
>> +             unsigned int j;
>> +
>> +             for (j = 0; j < attr_spec_array->num_attrs; j++) {
>> +                     struct uverbs_attr *attr = &attr_spec_array-
>> >attrs[j];
>> +                     struct uverbs_attr_spec *spec = &group_spec-
>> >attrs[j];
>> +
>> +                     if (!attr->valid)
>> +                             continue;
>> +
>> +                     if (spec->type == UVERBS_ATTR_TYPE_IDR ||
>> +                         spec->type == UVERBS_ATTR_TYPE_FD)
>> +                             /*
>> +                              * refcounts should be handled at the object
>> +                              * level and not at the uobject level.
>> +                              */
>> +                             uverbs_unlock_object(attr->obj_attr.uobject,
>> +                                                  spec->obj.access, success);
>> +             }
>> +     }
>> +}
>> +
>> +static unsigned int get_type_orders(const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int i;
>> +     unsigned int max = 0;
>> +
>> +     for (i = 0; i < types_group->num_groups; i++) {
>> +             unsigned int j;
>> +             const struct uverbs_types *types = types_group-
>> >type_groups[i];
>> +
>> +             for (j = 0; j < types->num_types; j++) {
>> +                     if (!types->types[j] || !types->types[j]->alloc)
>> +                             continue;
>> +                     if (types->types[j]->alloc->order > max)
>> +                             max = types->types[j]->alloc->order;
>> +             }
>> +     }
>> +
>> +     return max;
>> +}
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int num_orders = get_type_orders(types_group);
>> +     unsigned int i;
>> +
>> +     for (i = 0; i <= num_orders; i++) {
>> +             struct ib_uobject *obj, *next_obj;
>> +
>> +             /*
>> +              * No need to take lock here, as cleanup should be called
>> +              * after all commands finished executing. Newly executed
>> +              * commands should fail.
>> +              */
>> +             mutex_lock(&ucontext->uobjects_lock->lock);
>
> It's really confusing to see a comment about 'no need to take lock' immediately followed by a call to lock.
>

Yeah :) That was before adding the fd. I'll delete the comment.

>> +             list_for_each_entry_safe(obj, next_obj, &ucontext-
>> >uobjects,
>> +                                      list)
>> +                     if (obj->type->order == i) {
>> +                             if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +                                     ib_uverbs_uobject_remove(obj, false);
>> +                             else
>> +                                     ib_uverbs_remove_fd(obj);
>> +                     }
>> +             mutex_unlock(&ucontext->uobjects_lock->lock);
>> +     }
>> +     kref_put(&ucontext->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext)
>
> Please work on the function names.  This is horrendously long and still doesn't help describe what it does.
>

This just initialized the types part of the ucontext. Any suggestions?

>> +{
>> +     ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
>> >uobjects_lock),
>> +                                       GFP_KERNEL);
>> +     if (!ucontext->uobjects_lock)
>> +             return -ENOMEM;
>> +
>> +     init_uobjects_list_lock(ucontext->uobjects_lock);
>> +     INIT_LIST_HEAD(&ucontext->uobjects);
>> +
>> +     return 0;
>> +}
>> +
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext)
>> +{
>> +     kfree(ucontext->uobjects_lock);
>> +}
>
> No need to wrap a call to 'free'.
>

In order to abstract away the ucontext type data structure.

>> +
>> diff --git a/drivers/infiniband/core/rdma_core.h
>> b/drivers/infiniband/core/rdma_core.h
>> new file mode 100644
>> index 0000000..8990115
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.h
>> @@ -0,0 +1,75 @@
>> +/*
>> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
>> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
>> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
>> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef UOBJECT_H
>> +#define UOBJECT_H
>> +
>> +#include <linux/idr.h>
>> +#include <rdma/uverbs_ioctl.h>
>> +#include <rdma/ib_verbs.h>
>> +#include <linux/mutex.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type);
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr);
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd);
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success);
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success);
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group);
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_close_fd(struct file *f);
>> +void ib_uverbs_cleanup_fd(void *private_data);
>> +
>> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
>> +{
>> +     return uobj + 1;
>> +}
>
> This seems like a rather useless function.
>

Why? The user sholdn't know or care how we put our structs together.

>> +
>> +#endif /* UIDR_H */
>> diff --git a/drivers/infiniband/core/uverbs.h
>> b/drivers/infiniband/core/uverbs.h
>> index 8074705..ae7d4b8 100644
>> --- a/drivers/infiniband/core/uverbs.h
>> +++ b/drivers/infiniband/core/uverbs.h
>> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
>> *uverbs_file,
>>                                       struct ib_device *ib_dev,
>>                                       int is_async);
>> +void ib_uverbs_release_file(struct kref *ref);
>>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
>> *uverbs_file);
>>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
>>
>> diff --git a/drivers/infiniband/core/uverbs_main.c
>> b/drivers/infiniband/core/uverbs_main.c
>> index f783723..e63357a 100644
>> --- a/drivers/infiniband/core/uverbs_main.c
>> +++ b/drivers/infiniband/core/uverbs_main.c
>> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
>> ib_uverbs_device *dev)
>>       complete(&dev->comp);
>>  }
>>
>> -static void ib_uverbs_release_file(struct kref *ref)
>> +void ib_uverbs_release_file(struct kref *ref)
>>  {
>>       struct ib_uverbs_file *file =
>>               container_of(ref, struct ib_uverbs_file, ref);
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index b5d2075..7240615 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
>>
>>  struct ib_umem;
>>
>> +struct ib_ucontext_lock;
>> +
>>  struct ib_ucontext {
>>       struct ib_device       *device;
>> +     struct ib_uverbs_file  *ufile;
>>       struct list_head        pd_list;
>>       struct list_head        mr_list;
>>       struct list_head        mw_list;
>> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>>       struct list_head        rwq_ind_tbl_list;
>>       int                     closing;
>>
>> +     /* lock for uobjects list */
>> +     struct ib_ucontext_lock *uobjects_lock;
>> +     struct list_head        uobjects;
>> +
>>       struct pid             *tgid;
>>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>       struct rb_root      umem_tree;
>> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>>  #endif
>>  };
>>
>> +struct uverbs_object_list;
>> +
>> +#define OLD_ABI_COMPAT
>> +
>>  struct ib_uobject {
>>       u64                     user_handle;    /* handle given to us by userspace
>> */
>>       struct ib_ucontext     *context;        /* associated user context
>> */
>>       void                   *object;         /* containing object */
>>       struct list_head        list;           /* link to context's list */
>> -     int                     id;             /* index into kernel idr */
>> -     struct kref             ref;
>> -     struct rw_semaphore     mutex;          /* protects .live */
>> +     int                     id;             /* index into kernel idr/fd */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct kref             ref;
>> +#endif
>> +     struct rw_semaphore     usecnt;         /* protects exclusive
>> access */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct rw_semaphore     mutex;          /* protects .live */
>> +#endif
>>       struct rcu_head         rcu;            /* kfree_rcu() overhead */
>>       int                     live;
>> +
>> +     const struct uverbs_type_alloc_action *type;
>> +     struct ib_ucontext_lock *uobjects_lock;
>>  };
>>
>>  struct ib_udata {
>> @@ -2101,6 +2120,9 @@ struct ib_device {
>>        */
>>       int (*get_port_immutable)(struct ib_device *, u8, struct
>> ib_port_immutable *);
>>       void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
>> str_len);
>> +     struct list_head type_list;
>> +
>> +     const struct uverbs_types_group *types_group;
>>  };
>>
>>  struct ib_client {
>> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
>> new file mode 100644
>> index 0000000..2f50045
>> --- /dev/null
>> +++ b/include/rdma/uverbs_ioctl.h
>> @@ -0,0 +1,195 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef _UVERBS_IOCTL_
>> +#define _UVERBS_IOCTL_
>> +
>> +#include <linux/kernel.h>
>> +
>> +struct uverbs_object_type;
>> +struct ib_ucontext;
>> +struct ib_uobject;
>> +struct ib_device;
>> +struct uverbs_uobject_type;
>> +
>> +/*
>> + * =======================================
>> + *   Verbs action specifications
>> + * =======================================
>> + */
>
> I intentionally used urdma (though condensed to 3 letters that I don't recall atm), rather than uverbs.  This will need to work with non-verbs devices and interfaces -- again, consider how this fits with the rdma cm.  Verbs has a very specific meaning, which gets lost if we start referring to everything as 'verbs'.  It's bad enough that we're stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also means ethernet and rdma means nothing.
>

IMHO - let's agree on the concept of this infrastructure. One we
decide its scope, we could generalize it (i.e - ioctl_provider and
ioctl_context) and implement it to rdma-cm as well.

>> +
>> +enum uverbs_attr_type {
>> +     UVERBS_ATTR_TYPE_PTR_IN,
>> +     UVERBS_ATTR_TYPE_PTR_OUT,
>> +     UVERBS_ATTR_TYPE_IDR,
>> +     UVERBS_ATTR_TYPE_FD,
>> +};
>> +
>> +enum uverbs_idr_access {
>> +     UVERBS_IDR_ACCESS_READ,
>> +     UVERBS_IDR_ACCESS_WRITE,
>> +     UVERBS_IDR_ACCESS_NEW,
>> +     UVERBS_IDR_ACCESS_DESTROY
>> +};
>> +
>> +struct uverbs_attr_spec {
>> +     u16                             len;
>> +     enum uverbs_attr_type           type;
>> +     struct {
>> +             u16                     obj_type;
>> +             u8                      access;
>
> Is access intended to be an enum uverbs_idr_access value?
>

Yeah, worth using this enum. Thanks.

>> +     } obj;
>
> I would remove (flatten) the substructure and re-order the fields for better alignment.
>

I noticed there are several places which aren't aliged. It's in my todo list.

>> +};
>> +
>> +struct uverbs_attr_group_spec {
>> +     struct uverbs_attr_spec         *attrs;
>> +     size_t                          num_attrs;
>> +};
>> +
>> +struct uverbs_action_spec {
>> +     const struct uverbs_attr_group_spec             **attr_groups;
>> +     /* if > 0 -> validator, otherwise, error */
>
> ? not sure what this comment means
>
>> +     int (*dist)(__u16 *attr_id, void *priv);
>
> What does 'dist' stand for?
>

dist = distribution function.
It maps the attributes you got from the user-space to your groups. You
could think of each group as a namespace - where its attributes (or
types/actions) starts from zero in the sake of compactness.
So, for example, it gets an attribute 0x8010 and maps to to "group 1"
(provider) and attribute 0x10.

>> +     void                                            *priv;
>> +     size_t                                          num_groups;
>> +};
>> +
>> +struct uverbs_attr_array;
>> +struct ib_uverbs_file;
>> +
>> +struct uverbs_action {
>> +     struct uverbs_action_spec spec;
>> +     void *priv;
>> +     int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
>> *ufile,
>> +                    struct uverbs_attr_array *ctx, size_t num, void
>> *priv);
>> +};
>> +
>> +struct uverbs_type_alloc_action;
>> +typedef void (*free_type)(const struct uverbs_type_alloc_action
>> *uobject_type,
>> +                       struct ib_uobject *uobject);
>> +
>> +struct uverbs_type_alloc_action {
>> +     enum uverbs_attr_type           type;
>> +     int                             order;
>
> I think this is being used as destroy order, in which case I would rename it to clarify the intent.  Though I'd prefer we come up with a more efficient destruction mechanism than the repeated nested looping.
>

In one of the earlier revisions I used a sorted list, which was
efficient. I recall that Jason didn't like its complexity and
re-thinking about that - he's right. Most of your types are "order
number" 0 anyway. So you'll probably iterate very few objects in the
next round (in verbs, everything but MRs and PDs).

>> +     size_t                          obj_size;
>
> This can be alloc_fn
>
>> +     free_type                       free_fn;
>> +     struct {
>> +             const struct file_operations    *fops;
>> +             const char                      *name;
>> +             int                             flags;
>> +     } fd;
>> +};
>> +
>> +struct uverbs_type_actions_group {
>> +     size_t                                  num_actions;
>> +     const struct uverbs_action              **actions;
>> +};
>> +
>> +struct uverbs_type {
>> +     size_t                                  num_groups;
>> +     const struct uverbs_type_actions_group  **action_groups;
>> +     const struct uverbs_type_alloc_action   *alloc;
>> +     int (*dist)(__u16 *action_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +struct uverbs_types {
>> +     size_t                                  num_types;
>> +     const struct uverbs_type                **types;
>> +};
>> +
>> +struct uverbs_types_group {
>> +     const struct uverbs_types               **type_groups;
>> +     size_t                                  num_groups;
>> +     int (*dist)(__u16 *type_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +/* =================================================
>> + *              Parsing infrastructure
>> + * =================================================
>> + */
>> +
>> +struct uverbs_ptr_attr {
>> +     void    * __user ptr;
>> +     __u16           len;
>> +};
>> +
>> +struct uverbs_fd_attr {
>> +     int             fd;
>> +};
>> +
>> +struct uverbs_uobj_attr {
>> +     /*  idr handle */
>> +     __u32   idr;
>> +};
>> +
>> +struct uverbs_obj_attr {
>> +     /* pointer to the kernel descriptor -> type, access, etc */
>> +     const struct uverbs_attr_spec *val;
>> +     struct ib_uverbs_attr __user    *uattr;
>> +     const struct uverbs_type_alloc_action   *type;
>> +     struct ib_uobject               *uobject;
>> +     union {
>> +             struct uverbs_fd_attr           fd;
>> +             struct uverbs_uobj_attr         uobj;
>> +     };
>> +};
>> +
>> +struct uverbs_attr {
>> +     bool valid;
>> +     union {
>> +             struct uverbs_ptr_attr  cmd_attr;
>> +             struct uverbs_obj_attr  obj_attr;
>> +     };
>> +};
>
> It's odd to have a union that's part of a structure without some field to indicate which union field is accessible.
>

You index this array but the attribute id from the user's callback
funciton. The user should know what's the type of the attribute, as
[s]he declared the specification.

>> +
>> +/* output of one validator */
>> +struct uverbs_attr_array {
>> +     size_t num_attrs;
>> +     /* arrays of attrubytes, index is the id i.e SEND_CQ */
>> +     struct uverbs_attr *attrs;
>> +};
>> +
>> +/* =================================================
>> + *              Types infrastructure
>> + * =================================================
>> + */
>> +
>> +int ib_uverbs_uobject_type_add(struct list_head      *head,
>> +                            void (*free)(struct uverbs_uobject_type *type,
>> +                                         struct ib_uobject *uobject,
>> +                                         struct ib_ucontext *ucontext),
>> +                            uint16_t obj_type);
>> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
>> +
>> +#endif
>> --
>> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for taking a look.

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 perftest] Support for Chelsio T6 devices
From: Leon Romanovsky @ 2016-10-31 23:11 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Gil Rockah', 'Zohar Ben Aharon',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <010701d2305d$27a9fff0$76fdffd0$@opengridcomputing.com>

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

On Thu, Oct 27, 2016 at 09:19:40AM -0500, Steve Wise wrote:
> > > Hey guys,
> > >
> > > Has this patch been integrated yet?  Also, where is the official upstream
> > > perftest git repo now?
> >
> > Hi Steve,
> >
> > Sorry for the late response, due to the holidays our responses are
> > delaying a little bit.
> >
> > We moved perftest repo to be under github's linux-rdma organization [1]
> > and it is now [2].
> >
> > I'll remind to Zohar to take it.
> >
> > [1] https://github.com/linux-rdma/
> > [2] https://github.com/linux-rdma/perftest
>
>
> Thanks for the update!

Hi Steve,
I merged both of your patches
https://patchwork.kernel.org/patch/9341763/
https://patchwork.kernel.org/patch/9219151/
https://github.com/linux-rdma/perftest/pull/1

Sorry for the delay.

>
> Steve.
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH] net/mlx5: Simplify a test
From: Christophe JAILLET @ 2016-11-01  7:10 UTC (permalink / raw)
  To: matanb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA, Christophe JAILLET

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet-39ZsbGIQGT5GWvitb5QawA@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 904853f9cf7a..330955f6badc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering *steering)
 {
 
 	steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
-	if (IS_ERR_OR_NULL(steering->root_ns))
+	if (!steering->root_ns)
 		goto cleanup;
 
 	if (init_root_tree(steering, &root_fs, &steering->root_ns->ns.node))
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] net/mlx5: Simplify a test
From: Matan Barak @ 2016-11-01  9:38 UTC (permalink / raw)
  To: Christophe JAILLET, leonro
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20161101071053.12486-1-christophe.jaillet@wanadoo.fr>

On 01/11/2016 09:10, Christophe JAILLET wrote:
> 'create_root_ns()' does not return an error pointer, so the test can be
> simplified to be more consistent.
>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index 904853f9cf7a..330955f6badc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -1833,7 +1833,7 @@ static int init_root_ns(struct mlx5_flow_steering *steering)
>  {
>
>  	steering->root_ns = create_root_ns(steering, FS_FT_NIC_RX);
> -	if (IS_ERR_OR_NULL(steering->root_ns))
> +	if (!steering->root_ns)
>  		goto cleanup;
>
>  	if (init_root_tree(steering, &root_fs, &steering->root_ns->ns.node))
>

Thanks.
Acked-by: Matan Barak <matanb@mellanox.com>

^ permalink raw reply

* Re: [PATCH] net/mlx5: Simplify a test
From: Saeed Mahameed @ 2016-11-01  9:59 UTC (permalink / raw)
  To: Christophe JAILLET, matanb, leonro
  Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <20161101071053.12486-1-christophe.jaillet@wanadoo.fr>



On 11/01/2016 09:10 AM, Christophe JAILLET wrote:
> 'create_root_ns()' does not return an error pointer, so the test can be
> simplified to be more consistent.
>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* Re: [PATCH rdma-next 2/4] IB/core: Support rate limit for packet pacing
From: Yuval Shaia @ 2016-11-01 10:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bodong Wang
In-Reply-To: <1477909297-14491-3-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Two (extremely) minor suggestions inline.

Yuval

On Mon, Oct 31, 2016 at 12:21:35PM +0200, Leon Romanovsky wrote:
> From: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add new member rate_limit to ib_qp_attr, it shows the packet pacing rate

Suggesting to replace with:
Add new member rate_limit to ib_qp_attr which holds the packet pacing rate

> in Kbps, 0 means unlimited.
> 
> IB_QP_RATE_LIMIT is added to ib_attr_mask, and it could be used by RAW

Suggesting to replace with:
IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW

> QPs when changing QP state from RTR to RTS, RTS to RTS.
> 
> Signed-off-by: Bodong Wang <bodong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/core/verbs.c | 2 ++
>  include/rdma/ib_verbs.h         | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 8368764..3e688b3 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1014,6 +1014,7 @@ static const struct {
>  						 IB_QP_QKEY),
>  				 [IB_QPT_GSI] = (IB_QP_CUR_STATE		|
>  						 IB_QP_QKEY),
> +				 [IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
>  			 }
>  		}
>  	},
> @@ -1047,6 +1048,7 @@ static const struct {
>  						IB_QP_QKEY),
>  				[IB_QPT_GSI] = (IB_QP_CUR_STATE			|
>  						IB_QP_QKEY),
> +				[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
>  			}
>  		},
>  		[IB_QPS_SQD]   = {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 5ad43a4..a065361 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1102,6 +1102,7 @@ enum ib_qp_attr_mask {
>  	IB_QP_RESERVED2			= (1<<22),
>  	IB_QP_RESERVED3			= (1<<23),
>  	IB_QP_RESERVED4			= (1<<24),
> +	IB_QP_RATE_LIMIT		= (1<<25),
>  };
>  
>  enum ib_qp_state {
> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>  	u8			rnr_retry;
>  	u8			alt_port_num;
>  	u8			alt_timeout;
> +	u32			rate_limit;
>  };
>  
>  enum ib_wr_opcode {
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH -next] qedr: Use list_move_tail instead of list_del/list_add_tail
From: Amrani, Ram @ 2016-11-01 10:28 UTC (permalink / raw)
  To: Wei Yongjun, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Borundia, Rajesh
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477757993-32186-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> Using list_move_tail() instead of list_del() + list_add_tail().
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..cdaddf9 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -2413,8 +2413,7 @@ static void handle_completed_mrs(struct qedr_dev
> *dev, struct mr_info *info)
>  		 */
>  		pbl = list_first_entry(&info->inuse_pbl_list,
>  				       struct qedr_pbl, list_entry);
> -		list_del(&pbl->list_entry);
> -		list_add_tail(&pbl->list_entry, &info->free_pbl_list);
> +		list_move_tail(&pbl->list_entry, &info->free_pbl_list);
>  		info->completed_handled++;
>  	}
>  }

Thanks

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Amrani, Ram @ 2016-11-01 10:38 UTC (permalink / raw)
  To: Wei Yongjun, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Borundia, Rajesh
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477672427-31575-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> 'qp' is malloced in qedr_create_qp() and should be freed before leaving from the
> error handling cases, otherwise it will cause memory leak.
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..b60f145 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -1477,6 +1477,7 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	struct qedr_ucontext *ctx = NULL;
>  	struct qedr_create_qp_ureq ureq;
>  	struct qedr_qp *qp;
> +	struct ib_qp *ibqp;
>  	int rc = 0;
> 
>  	DP_DEBUG(dev, QEDR_MSG_QP, "create qp: called from %s, pd=%p\n",
> @@ -1486,13 +1487,13 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	if (rc)
>  		return ERR_PTR(rc);
> 
> +	if (attrs->srq)
> +		return ERR_PTR(-EINVAL);
> +
>  	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
>  	if (!qp)
>  		return ERR_PTR(-ENOMEM);
> 
> -	if (attrs->srq)
> -		return ERR_PTR(-EINVAL);
> -
>  	DP_DEBUG(dev, QEDR_MSG_QP,
>  		 "create qp: sq_cq=%p, sq_icid=%d, rq_cq=%p, rq_icid=%d\n",
>  		 get_qedr_cq(attrs->send_cq),
> @@ -1508,7 +1509,10 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  			       "create qp: unexpected udata when creating GSI
> QP\n");
>  			goto err0;
>  		}
> -		return qedr_create_gsi_qp(dev, attrs, qp);
> +		ibqp = qedr_create_gsi_qp(dev, attrs, qp);
> +		if (IS_ERR(ibqp))
> +			kfree(qp);
> +		return ibqp;
>  	}
> 
>  	memset(&in_params, 0, sizeof(in_params));

Thanks again

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox