public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Sending kernel pathrecord query to user cache server
@ 2015-06-12 14:06 kaike.wan-ral2JQCrhuEAvxtiuMwx3w
       [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-12 14:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

A SA cache is undeniably critical for fabric scalability and performance.
In user space, the ibacm application provides a good example of pathrecord
cache for address and route resolution. With the recent implementation of
the provider architecture, ibacm offers more extensibility as a SA cache.
In kernel, ipoib implements its own small cache for pathrecords, which is
however not available for general use. Furthermore, the implementation of
a SA cache in user space offers better flexibility, larger capacity, and
more robustness for the system.

In this patch series, a mechanism is implemented to allow ib_sa to
send pathrecord query to a user application (eg ibacm) through netlink.
Potentially, this mechanism could be easily extended to other SA queries.

With a customized test implemented in rdma_cm module (not included in this
series), it was shown that the time to retrieve 1 million pathrecords
dropped from 46660 jiffies (46.66 seconds) to 16119 jiffies (or 16.119
seconds) on a two-node system, a reduction of more than 60%.

Changes since v5:
- Patch 1:
  - Replace reversible and numb_path attributes with path_use attribute.
  - Define Mandatory attribute flag.
  - Define attribute data types in cpu byte order.
- Patch 4:
  - Change the calculation of total attribute len;
  - Modify the setting of attributes.

Changes since v4:
- Patch 1: rename LS_NLA_TYPE_NUM_PATH as LS_NLA_TYPE_NUMB_PATH.
- Patch 4: remove the renaming of LS_NLA_TYPE_NUM_PATH as
           LS_NLA_TYPE_NUMB_PATH.

Changes since v3:
- Patch 1: add basic RESOLVE attribute types.
- Patch 4: change the encoding of the RESOLVE request message based on
  the new attribute types and the input comp_mask. Change the response
  handling by iterating all attributes.

Changes since v2:
- Redesigne the communication protocol between the kernel and user space
  application. Instead of the MAD packet format, the new protocol uses
  netlink message header and attributes to exchange request and
  response between the kernel and user space.The design was described
  here:
  http://www.spinics.net/lists/linux-rdma/msg25621.html

Changes since v1:
- Move kzalloc changes into a separate patch (Patch 3).
- Remove redundant include line (Patch 4).
- Rename struct rdma_nl_resp_msg as structure ib_nl_resp_msg (Patch 4).

Kaike Wan (4):
  IB/netlink: Add defines for local service requests through netlink
  IB/core: Check the presence of netlink multicast group listeners
  IB/sa: Allocate SA query with kzalloc
  IB/sa: Route SA pathrecord query through netlink

 drivers/infiniband/core/netlink.c  |    8 +
 drivers/infiniband/core/sa_query.c |  541 +++++++++++++++++++++++++++++++++++-
 include/rdma/rdma_netlink.h        |    7 +
 include/uapi/rdma/rdma_netlink.h   |   87 ++++++
 4 files changed, 638 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v6 1/4] IB/netlink: Add defines for local service requests through netlink
       [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-12 14:06   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 14:06   ` [PATCH v6 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-12 14:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch adds netlink defines for SA client, local service group, local
service operations, and related attributes.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 include/uapi/rdma/rdma_netlink.h |   87 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 6e4bb42..d2c50e9 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -7,12 +7,14 @@ enum {
 	RDMA_NL_RDMA_CM = 1,
 	RDMA_NL_NES,
 	RDMA_NL_C4IW,
+	RDMA_NL_SA,
 	RDMA_NL_NUM_CLIENTS
 };
 
 enum {
 	RDMA_NL_GROUP_CM = 1,
 	RDMA_NL_GROUP_IWPM,
+	RDMA_NL_GROUP_LS,
 	RDMA_NL_NUM_GROUPS
 };
 
@@ -128,5 +130,90 @@ enum {
 	IWPM_NLA_ERR_MAX
 };
 
+/* Local service group opcodes */
+enum {
+	RDMA_NL_LS_OP_RESOLVE = 0,
+	RDMA_NL_LS_OP_SET_TIMEOUT,
+	RDMA_NL_LS_NUM_OPS
+};
+
+/* Local service netlink message flags */
+#define RDMA_NL_LS_F_OK		0x0100	/* Success response */
+#define RDMA_NL_LS_F_ERR	0x0200	/* Failed response */
+
+/* Local service attribute type */
+#define RDMA_NLA_F_MANDATORY	(1 << 13)
+#define RDMA_NLA_TYPE_MASK	~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | \
+				  RDMA_NLA_F_MANDATORY)
+
+enum {
+	LS_NLA_TYPE_STATUS = 0,
+	LS_NLA_TYPE_PATH_RECORD,
+	LS_NLA_TYPE_TIMEOUT,
+	LS_NLA_TYPE_SERVICE_ID,
+	LS_NLA_TYPE_DGID,
+	LS_NLA_TYPE_SGID,
+	LS_NLA_TYPE_TCLASS,
+	LS_NLA_TYPE_PATH_USE,
+	LS_NLA_TYPE_PKEY,
+	LS_NLA_TYPE_QOS_CLASS,
+	LS_NLA_TYPE_MAX
+};
+
+/* Local service status attribute */
+enum {
+	LS_NLA_STATUS_SUCCESS = 0,
+	LS_NLA_STATUS_EINVAL,
+	LS_NLA_STATUS_ENODATA,
+	LS_NLA_STATUS_MAX
+};
+
+struct rdma_nla_ls_status {
+	__u32		status;
+};
+
+/* Local service pathrecord attribute: struct ib_path_rec_data */
+
+/* Local service timeout attribute */
+struct rdma_nla_ls_timeout {
+	__u32		timeout;
+};
+
+/* Local Service ServiceID attribute */
+struct rdma_nla_ls_service_id {
+	__u64		service_id;
+};
+
+/* Local Service DGID/SGID attribute: big endian */
+struct rdma_nla_ls_gid {
+	__u8		gid[16];
+};
+
+/* Local Service Traffic Class attribute */
+struct rdma_nla_ls_tclass {
+	__u8		tclass;
+};
+
+/* Local Service path use attribute */
+enum {
+	LS_NLA_PATH_USE_ALL = 0,
+	LS_NLA_PATH_USE_UNIDIRECTIONAL,
+	LS_NLA_PATH_USE_GMP,
+	LS_NLA_PATH_USE_MAX
+};
+
+struct rdma_nla_ls_path_use {
+	__u8		use;
+};
+
+/* Local Service Pkey attribute*/
+struct rdma_nla_ls_pkey {
+	__u16		pkey;
+};
+
+/* Local Service Qos Class attribute */
+struct rdma_nla_ls_qos_class {
+	__u16		qos_class;
+};
 
 #endif /* _UAPI_RDMA_NETLINK_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 2/4] IB/core: Check the presence of netlink multicast group listeners
       [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-12 14:06   ` [PATCH v6 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-12 14:06   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 14:06   ` [PATCH v6 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 14:06   ` [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  3 siblings, 0 replies; 7+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-12 14:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch adds a function to check if listeners for a netlink multicast
group are present.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/netlink.c |    8 ++++++++
 include/rdma/rdma_netlink.h       |    7 +++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
index 23dd5a5..e0fc913 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
 static struct sock *nls;
 static LIST_HEAD(client_list);
 
+int ibnl_chk_listeners(unsigned int group)
+{
+	if (netlink_has_listeners(nls, group) == 0)
+		return -1;
+	return 0;
+}
+EXPORT_SYMBOL(ibnl_chk_listeners);
+
 int ibnl_add_client(int index, int nops,
 		    const struct ibnl_client_cbs cb_table[])
 {
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 0790882..5852661 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -77,4 +77,11 @@ int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
 int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh,
 			unsigned int group, gfp_t flags);
 
+/**
+ * Check if there are any listeners to the netlink group
+ * @group: the netlink group ID
+ * Returns 0 on success or a negative for no listeners.
+ */
+int ibnl_chk_listeners(unsigned int group);
+
 #endif /* _RDMA_NETLINK_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 3/4] IB/sa: Allocate SA query with kzalloc
       [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2015-06-12 14:06   ` [PATCH v6 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 14:06   ` [PATCH v6 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-12 14:06   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  2015-06-12 14:06   ` [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  3 siblings, 0 replies; 7+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-12 14:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..17e1cf7 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -739,7 +739,7 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -861,7 +861,7 @@ int ib_sa_service_rec_query(struct ib_sa_client *client,
 	    method != IB_SA_METHOD_DELETE)
 		return -EINVAL;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -953,7 +953,7 @@ int ib_sa_mcmember_rec_query(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
@@ -1050,7 +1050,7 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
 	agent = port->agent;
 
-	query = kmalloc(sizeof *query, gfp_mask);
+	query = kzalloc(sizeof(*query), gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-06-12 14:06   ` [PATCH v6 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-06-12 14:06   ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
       [not found]     ` <1434117966-17978-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  3 siblings, 1 reply; 7+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-06-12 14:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny

From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c |  533 +++++++++++++++++++++++++++++++++++-
 1 files changed, 532 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 17e1cf7..17390cf 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -45,12 +45,21 @@
 #include <uapi/linux/if_ether.h>
 #include <rdma/ib_pack.h>
 #include <rdma/ib_cache.h>
+#include <rdma/rdma_netlink.h>
+#include <net/netlink.h>
+#include <uapi/rdma/ib_user_sa.h>
+#include <rdma/ib_marshall.h>
 #include "sa.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand subnet administration query support");
 MODULE_LICENSE("Dual BSD/GPL");
 
+#define IB_SA_LOCAL_SVC_TIMEOUT_MIN		100
+#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT		2000
+#define IB_SA_LOCAL_SVC_TIMEOUT_MAX		200000
+static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
+
 struct ib_sa_sm_ah {
 	struct ib_ah        *ah;
 	struct kref          ref;
@@ -80,8 +89,24 @@ struct ib_sa_query {
 	struct ib_mad_send_buf *mad_buf;
 	struct ib_sa_sm_ah     *sm_ah;
 	int			id;
+	u32			flags;
 };
 
+#define IB_SA_ENABLE_LOCAL_SERVICE	0x00000001
+#define IB_SA_CANCEL			0x00000002
+
+#define IB_SA_LOCAL_SVC_ENABLED(query) \
+	((query)->flags & IB_SA_ENABLE_LOCAL_SERVICE)
+#define IB_SA_ENABLE_LOCAL_SVC(query) \
+	((query)->flags |= IB_SA_ENABLE_LOCAL_SERVICE)
+#define IB_SA_DISABLE_LOCAL_SVC(query) \
+	((query)->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE)
+
+#define IB_SA_QUERY_CANCELLED(query) \
+	((query)->flags & IB_SA_CANCEL)
+#define IB_SA_CANCEL_QUERY(query) \
+	((query)->flags |= IB_SA_CANCEL)
+
 struct ib_sa_service_query {
 	void (*callback)(int, struct ib_sa_service_rec *, void *);
 	void *context;
@@ -106,6 +131,26 @@ struct ib_sa_mcmember_query {
 	struct ib_sa_query sa_query;
 };
 
+struct ib_nl_request_info {
+	struct list_head list;
+	u32 seq;
+	unsigned long timeout;
+	struct ib_sa_query *query;
+};
+
+struct ib_nl_attr_info {
+	u16 len;	/* Total attr len: header + payload + padding */
+	ib_sa_comp_mask comp_mask;
+	void *input;
+	void (*set_attrs)(struct sk_buff *skb, struct ib_nl_attr_info *info);
+};
+
+static LIST_HEAD(ib_nl_request_list);
+static DEFINE_SPINLOCK(ib_nl_request_lock);
+static atomic_t ib_nl_sa_request_seq;
+static struct workqueue_struct *ib_nl_wq;
+static struct delayed_work ib_nl_timed_work;
+
 static void ib_sa_add_one(struct ib_device *device);
 static void ib_sa_remove_one(struct ib_device *device);
 
@@ -381,6 +426,451 @@ static const struct ib_field guidinfo_rec_table[] = {
 	  .size_bits    = 512 },
 };
 
+static int ib_nl_send_msg(int opcode, struct ib_nl_attr_info *attrs, u32 seq)
+{
+	struct sk_buff *skb = NULL;
+	struct nlmsghdr *nlh;
+	void *data;
+	int ret = 0;
+
+	if (attrs->len <= 0)
+		return -EMSGSIZE;
+
+	skb = nlmsg_new(attrs->len, GFP_KERNEL);
+	if (!skb) {
+		pr_err("alloc failed ret=%d\n", ret);
+		return -ENOMEM;
+	}
+
+	/* Put nlmsg header only for now */
+	data = ibnl_put_msg(skb, &nlh, seq, 0, RDMA_NL_SA,
+			    opcode, GFP_KERNEL);
+	if (!data) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	/* Add attributes */
+	attrs->set_attrs(skb, attrs);
+
+	/* Repair the nlmsg header length */
+	nlmsg_end(skb, nlh);
+
+	ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
+	if (!ret) {
+		ret = attrs->len;
+	} else {
+		if (ret != -ESRCH)
+			pr_err("ibnl_multicast failed l=%d, r=%d\n",
+			       attrs->len, ret);
+		ret = 0;
+	}
+	return ret;
+}
+
+static struct ib_nl_request_info *
+ib_nl_alloc_request(struct ib_sa_query *query)
+{
+	struct ib_nl_request_info *rinfo;
+
+	rinfo = kzalloc(sizeof(*rinfo), GFP_ATOMIC);
+	if (rinfo == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&rinfo->list);
+	rinfo->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
+	rinfo->query = query;
+
+	return rinfo;
+}
+
+static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
+				     struct ib_nl_attr_info *info)
+{
+	struct ib_sa_path_rec *sa_rec = info->input;
+	__u8 val1;
+	__u16 val2;
+	__u64 val3;
+
+	if (info->comp_mask & IB_SA_PATH_REC_SERVICE_ID) {
+		val3 = be64_to_cpu(sa_rec->service_id);
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SERVICE_ID,
+			sizeof(val3), &val3);
+	}
+	if (info->comp_mask & IB_SA_PATH_REC_DGID)
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_DGID,
+			sizeof(sa_rec->dgid), &sa_rec->dgid);
+	if (info->comp_mask & IB_SA_PATH_REC_SGID)
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SGID,
+			sizeof(sa_rec->sgid), &sa_rec->sgid);
+	if (info->comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_TCLASS,
+			sizeof(sa_rec->traffic_class), &sa_rec->traffic_class);
+
+	if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) &&
+	    sa_rec->reversible != 0) {
+		if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) &&
+		    sa_rec->numb_path > 1)
+			val1 = LS_NLA_PATH_USE_ALL;
+		else
+			val1 = LS_NLA_PATH_USE_GMP;
+	} else {
+		val1 = LS_NLA_PATH_USE_UNIDIRECTIONAL;
+	}
+	nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE, sizeof(val1),
+		&val1);
+
+	if (info->comp_mask & IB_SA_PATH_REC_PKEY) {
+		val2 = be16_to_cpu(sa_rec->pkey);
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PKEY,
+			sizeof(val2), &val2);
+	}
+	if (info->comp_mask & IB_SA_PATH_REC_QOS_CLASS) {
+		val2 = be16_to_cpu(sa_rec->qos_class);
+		nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_QOS_CLASS,
+			sizeof(val2), &val2);
+	}
+}
+
+static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info)
+{
+	int len = 0;
+
+	if (info->comp_mask & IB_SA_PATH_REC_SERVICE_ID)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_service_id));
+	if (info->comp_mask & IB_SA_PATH_REC_DGID)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
+	if (info->comp_mask & IB_SA_PATH_REC_SGID)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
+	if (info->comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_tclass));
+	if (info->comp_mask & IB_SA_PATH_REC_PKEY)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_pkey));
+	if (info->comp_mask & IB_SA_PATH_REC_QOS_CLASS)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_qos_class));
+
+	/*
+	 * We need path use attribute no matter reversible or numb_path is
+	 * set or not, as long as some other fields get set
+	 */
+	if (len > 0)
+		len += nla_total_size(sizeof(struct rdma_nla_ls_path_use));
+
+	return len;
+}
+
+static int ib_nl_send_request(struct ib_nl_request_info *rinfo)
+{
+	struct ib_nl_attr_info info;
+	int opcode;
+	struct ib_sa_mad *mad;
+	unsigned long flags;
+	unsigned long delay;
+	int ret;
+
+	mad = rinfo->query->mad_buf->mad;
+	switch (mad->mad_hdr.attr_id) {
+	case cpu_to_be16(IB_SA_ATTR_PATH_REC):
+		opcode = RDMA_NL_LS_OP_RESOLVE;
+		mad = rinfo->query->mad_buf->mad;
+		info.comp_mask = mad->sa_hdr.comp_mask;
+		info.input = rinfo->query->mad_buf->context[1];
+		rinfo->query->mad_buf->context[1] = NULL;
+		info.len = ib_nl_get_path_rec_attrs_len(&info);
+		info.set_attrs = ib_nl_set_path_rec_attrs;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	ret = ib_nl_send_msg(opcode, &info, rinfo->seq);
+	if (ret <= 0) {
+		ret = -EIO;
+		goto request_out;
+	} else {
+		ret = 0;
+	}
+
+	delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+	rinfo->timeout = delay + jiffies;
+	list_add_tail(&rinfo->list, &ib_nl_request_list);
+	/* Start the timeout if this is the only request */
+	if (ib_nl_request_list.next == &rinfo->list)
+		queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+
+request_out:
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+	return ret;
+}
+
+static int ib_nl_make_request(struct ib_sa_query *query)
+{
+	struct ib_nl_request_info *rinfo;
+	int ret;
+
+	rinfo = ib_nl_alloc_request(query);
+	if (IS_ERR(rinfo))
+		return -ENOMEM;
+
+	ret = ib_nl_send_request(rinfo);
+	if (ret)
+		kfree(rinfo);
+
+	return ret;
+}
+
+static int ib_nl_cancel_request(struct ib_sa_query *query)
+{
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	int found = 0;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+		/* Let the timeout to take care of the callback */
+		if (query == rinfo->query) {
+			IB_SA_CANCEL_QUERY(query);
+			rinfo->timeout = jiffies;
+			list_move(&rinfo->list, &ib_nl_request_list);
+			found = 1;
+			mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, 1);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+	return found;
+}
+
+static void send_handler(struct ib_mad_agent *agent,
+			 struct ib_mad_send_wc *mad_send_wc);
+
+static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
+					   const struct nlmsghdr *nlh)
+{
+	struct ib_mad_send_wc mad_send_wc;
+	struct ib_sa_mad *mad = NULL;
+	const struct nlattr *head, *curr;
+	struct ib_path_rec_data  *rec;
+	int len, rem;
+
+	if (query->callback) {
+		head = (const struct nlattr *) nlmsg_data(nlh);
+		len = nlmsg_len(nlh);
+		nla_for_each_attr(curr, head, len, rem) {
+			if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
+				rec = nla_data(curr);
+				if (rec->flags && IB_PATH_PRIMARY) {
+					mad = query->mad_buf->mad;
+					mad->mad_hdr.method |=
+						IB_MGMT_METHOD_RESP;
+					memcpy(mad->data, rec->path_rec,
+					       sizeof(rec->path_rec));
+					query->callback(query, 0, mad);
+					break;
+				}
+			}
+		}
+	}
+
+	mad_send_wc.send_buf = query->mad_buf;
+	mad_send_wc.status = IB_WC_SUCCESS;
+	send_handler(query->mad_buf->mad_agent, &mad_send_wc);
+}
+
+static void ib_nl_request_timeout(struct work_struct *work)
+{
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	struct ib_sa_query *query;
+	unsigned long delay;
+	struct ib_mad_send_wc mad_send_wc;
+	int ret;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	while (!list_empty(&ib_nl_request_list)) {
+		rinfo = list_entry(ib_nl_request_list.next,
+				   struct ib_nl_request_info, list);
+
+		if (time_after(rinfo->timeout, jiffies)) {
+			delay = rinfo->timeout - jiffies;
+			if ((long)delay <= 0)
+				delay = 1;
+			queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+			break;
+		}
+
+		list_del(&rinfo->list);
+		query = rinfo->query;
+		IB_SA_DISABLE_LOCAL_SVC(query);
+		/* Hold the lock to protect against query cancellation */
+		if (IB_SA_QUERY_CANCELLED(query))
+			ret = -1;
+		else
+			ret = ib_post_send_mad(query->mad_buf, NULL);
+		if (ret) {
+			mad_send_wc.send_buf = query->mad_buf;
+			mad_send_wc.status = IB_WC_WR_FLUSH_ERR;
+			spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+			send_handler(query->port->agent, &mad_send_wc);
+			spin_lock_irqsave(&ib_nl_request_lock, flags);
+		}
+		kfree(rinfo);
+	}
+	spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+}
+
+static inline int ib_nl_is_good_resolve_resp(const struct nlmsghdr *nlh)
+{
+	const struct nlattr *head, *curr;
+	struct ib_path_rec_data  *rec;
+	int len, rem;
+
+	if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR)
+		return 0;
+
+	if (!(nlh->nlmsg_flags & RDMA_NL_LS_F_OK))
+		return 0;
+
+	if (nlmsg_len(nlh) < nla_attr_size(sizeof(*rec)))
+		return 0;
+
+	head = (const struct nlattr *) nlmsg_data(nlh);
+	len = nlmsg_len(nlh);
+	nla_for_each_attr(curr, head, len, rem) {
+		if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
+			rec = nla_data(curr);
+			if (rec->flags && IB_PATH_PRIMARY)
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
+static int ib_nl_handle_set_timeout(struct sk_buff *skb,
+				    struct netlink_callback *cb)
+{
+	const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+	int timeout, delta, abs_delta;
+	const struct nlattr *attr;
+	struct rdma_nla_ls_timeout *to_attr;
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	long delay = 0;
+
+	if (nlmsg_len(nlh) < nla_attr_size(sizeof(*to_attr)))
+		goto settimeout_out;
+
+	attr = (const struct nlattr *) nlmsg_data(nlh);
+	if (attr->nla_type != LS_NLA_TYPE_TIMEOUT ||
+	    nla_len(attr) != sizeof(*to_attr))
+		goto settimeout_out;
+
+	to_attr = (struct rdma_nla_ls_timeout *) nla_data(attr);
+	timeout = (int) to_attr->timeout;
+	if (timeout < IB_SA_LOCAL_SVC_TIMEOUT_MIN)
+		timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN;
+	if (timeout > IB_SA_LOCAL_SVC_TIMEOUT_MAX)
+		timeout = IB_SA_LOCAL_SVC_TIMEOUT_MAX;
+
+	delta = timeout - sa_local_svc_timeout_ms;
+	if (delta < 0)
+		abs_delta = -delta;
+	else
+		abs_delta = delta;
+
+	if (delta != 0) {
+		spin_lock_irqsave(&ib_nl_request_lock, flags);
+		sa_local_svc_timeout_ms = timeout;
+		list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+			if (delta < 0 && abs_delta > rinfo->timeout)
+				rinfo->timeout = 0;
+			else
+				rinfo->timeout += delta;
+
+			/* Get the new delay from the first entry */
+			if (!delay) {
+				delay = rinfo->timeout - jiffies;
+				if (delay <= 0)
+					delay = 1;
+			}
+		}
+		if (delay)
+			mod_delayed_work(ib_nl_wq, &ib_nl_timed_work,
+					 (unsigned long)delay);
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+	}
+
+settimeout_out:
+	return skb->len;
+}
+
+static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
+				     struct netlink_callback *cb)
+{
+	const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+	unsigned long flags;
+	struct ib_nl_request_info *rinfo;
+	struct ib_sa_query *query;
+	struct ib_mad_send_buf *send_buf;
+	struct ib_mad_send_wc mad_send_wc;
+	int found = 0;
+	int ret;
+
+	spin_lock_irqsave(&ib_nl_request_lock, flags);
+	list_for_each_entry(rinfo, &ib_nl_request_list, list) {
+		/*
+		 * If the query is cancelled, let the timeout routine
+		 * take care of it.
+		 */
+		if (nlh->nlmsg_seq == rinfo->seq) {
+			found = !IB_SA_QUERY_CANCELLED(rinfo->query);
+			if (found)
+				list_del(&rinfo->list);
+			break;
+		}
+	}
+
+	if (!found) {
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		goto resp_out;
+	}
+
+	query = rinfo->query;
+	send_buf = query->mad_buf;
+
+	if (!ib_nl_is_good_resolve_resp(nlh)) {
+		/* if the result is a failure, send out the packet via IB */
+		IB_SA_DISABLE_LOCAL_SVC(query);
+		ret = ib_post_send_mad(query->mad_buf, NULL);
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		if (ret) {
+			mad_send_wc.send_buf = send_buf;
+			mad_send_wc.status = IB_WC_GENERAL_ERR;
+			send_handler(query->port->agent, &mad_send_wc);
+		}
+	} else {
+		spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+		ib_nl_process_good_resolve_rsp(query, nlh);
+	}
+
+	kfree(rinfo);
+resp_out:
+	return skb->len;
+}
+
+static struct ibnl_client_cbs ib_sa_cb_table[] = {
+	[RDMA_NL_LS_OP_RESOLVE] = {
+		.dump = ib_nl_handle_resolve_resp,
+		.module = THIS_MODULE },
+	[RDMA_NL_LS_OP_SET_TIMEOUT] = {
+		.dump = ib_nl_handle_set_timeout,
+		.module = THIS_MODULE },
+};
+
 static void free_sm_ah(struct kref *kref)
 {
 	struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref);
@@ -502,7 +992,13 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query)
 	mad_buf = query->mad_buf;
 	spin_unlock_irqrestore(&idr_lock, flags);
 
-	ib_cancel_mad(agent, mad_buf);
+	/*
+	 * If the query is still on the netlink request list, schedule
+	 * it to be cancelled by the timeout routine. Otherwise, it has been
+	 * sent to the MAD layer and has to be cancelled from there.
+	 */
+	if (!ib_nl_cancel_request(query))
+		ib_cancel_mad(agent, mad_buf);
 }
 EXPORT_SYMBOL(ib_sa_cancel_query);
 
@@ -638,6 +1134,14 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
 	query->mad_buf->context[0] = query;
 	query->id = id;
 
+	if (IB_SA_LOCAL_SVC_ENABLED(query)) {
+		if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
+			if (!ib_nl_make_request(query))
+				return id;
+		}
+		IB_SA_DISABLE_LOCAL_SVC(query);
+	}
+
 	ret = ib_post_send_mad(query->mad_buf, NULL);
 	if (ret) {
 		spin_lock_irqsave(&idr_lock, flags);
@@ -766,6 +1270,9 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
 
 	*sa_query = &query->sa_query;
 
+	IB_SA_ENABLE_LOCAL_SVC(&query->sa_query);
+	query->sa_query.mad_buf->context[1] = rec;
+
 	ret = send_mad(&query->sa_query, timeout_ms, gfp_mask);
 	if (ret < 0)
 		goto err2;
@@ -1250,6 +1757,8 @@ static int __init ib_sa_init(void)
 
 	get_random_bytes(&tid, sizeof tid);
 
+	atomic_set(&ib_nl_sa_request_seq, 0);
+
 	ret = ib_register_client(&sa_client);
 	if (ret) {
 		printk(KERN_ERR "Couldn't register ib_sa client\n");
@@ -1262,7 +1771,25 @@ static int __init ib_sa_init(void)
 		goto err2;
 	}
 
+	ib_nl_wq = create_singlethread_workqueue("ib_nl_sa_wq");
+	if (!ib_nl_wq) {
+		ret = -ENOMEM;
+		goto err3;
+	}
+
+	if (ibnl_add_client(RDMA_NL_SA, RDMA_NL_LS_NUM_OPS,
+			    ib_sa_cb_table)) {
+		pr_err("Failed to add netlink callback\n");
+		ret = -EINVAL;
+		goto err4;
+	}
+	INIT_DELAYED_WORK(&ib_nl_timed_work, ib_nl_request_timeout);
+
 	return 0;
+err4:
+	destroy_workqueue(ib_nl_wq);
+err3:
+	mcast_cleanup();
 err2:
 	ib_unregister_client(&sa_client);
 err1:
@@ -1271,6 +1798,10 @@ err1:
 
 static void __exit ib_sa_cleanup(void)
 {
+	ibnl_remove_client(RDMA_NL_SA);
+	cancel_delayed_work(&ib_nl_timed_work);
+	flush_workqueue(ib_nl_wq);
+	destroy_workqueue(ib_nl_wq);
 	mcast_cleanup();
 	ib_unregister_client(&sa_client);
 	idr_destroy(&query_idr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]     ` <1434117966-17978-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-06-25 23:16       ` Jason Gunthorpe
       [not found]         ` <20150625231636.GA7901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2015-06-25 23:16 UTC (permalink / raw)
  To: kaike.wan-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck, Ira Weiny

On Fri, Jun 12, 2015 at 10:06:06AM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> This patch routes a SA pathrecord query to netlink first and processes the
> response appropriately. If a failure is returned, the request will be sent
> through IB. The decision whether to route the request to netlink first is
> determined by the presence of a listener for the local service netlink
> multicast group. If the user-space local service netlink multicast group
> listener is not present, the request will be sent through IB, just like
> what is currently being done.

I'd really like to test this out here, I was hoping to get to that
this week, but not yet.. And I was still hoping to read through
carefully.

Anyhow, brief check seemed like things make sense

> +struct ib_nl_attr_info {
> +	u16 len;	/* Total attr len: header + payload + padding */
> +	ib_sa_comp_mask comp_mask;
> +	void *input;
> +	void (*set_attrs)(struct sk_buff *skb, struct ib_nl_attr_info
> *info);

I don't really get what this is for, set_attrs is only ever equal to
ib_nl_set_path_rec_attrs - is this whole indirection left over from
something else? Drop it?

This code does wander quite a bit, lots of functions are called from
only one place, not necessarily bad, but at least topo sort them so the
one-place called function is before the only caller... Makes it easier
to read

> +	struct ib_sa_path_rec *sa_rec = info->input;
> +	__u8 val1;
> +	__u16 val2;
> +	__u64 val3;

At least use val8, val16, val64 for names and IIRC, the __ is not used
inside the kernel proper

> +
> +	if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) &&
> +	    sa_rec->reversible != 0) {
> +		if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) &&
> +		    sa_rec->numb_path > 1)

The kernel never sets numb path, I would just never set _ALL for now
and leave it as a placeholder.

> +			val1 = LS_NLA_PATH_USE_ALL;
> +		else
> +			val1 = LS_NLA_PATH_USE_GMP;
> +	} else {
> +		val1 = LS_NLA_PATH_USE_UNIDIRECTIONAL;
> +	}
> +	nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE, sizeof(val1),
> +		&val1);

This uses a u8 but other places are not using that:
+    case LS_NLA_TYPE_PATH_USE:
+    	  use = (struct rdma_nla_ls_path_use *) NLA_DATA(attr);

> +			if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
> +				rec = nla_data(curr);
> +				if (rec->flags && IB_PATH_PRIMARY) {

&& is the wrong operator isn't it?

IB_PATH_PRIMARY is the wrong test, I think I covered this already..

I still feel like the netlink specific stuff should live in its own
file, not be jammed into sa_query.c - unless it is really hard to
disentangle, but it doesn't look too bad at first glance..

> +static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info)
> +{
> +	if (len > 0)
> +	   len += nla_total_size(sizeof(struct rdma_nla_ls_path_use));

Seems a bit goofy, I don't think we should ever generate an empty
netlink query. That feels like a WARN_ON scenario.

> +	skb = nlmsg_new(attrs->len, GFP_KERNEL);
> +	if (!skb) {
> +	   pr_err("alloc failed ret=%d\n", ret);

Doesn't alloc failure already print enough?

> +	      if (ret != -ESRCH)
> +	      	      pr_err("ibnl_multicast failed l=%d, r=%d\n",
> +		      			     	           attrs->len,ret);

What is this print about? Seem strange?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink
       [not found]         ` <20150625231636.GA7901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-06-26 14:49           ` Wan, Kaike
  0 siblings, 0 replies; 7+ messages in thread
From: Wan, Kaike @ 2015-06-26 14:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fleck, John,
	Weiny, Ira

> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Thursday, June 25, 2015 7:17 PM
> To: Wan, Kaike
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Fleck, John; Weiny, Ira
> Subject: Re: [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink
> 
> On Fri, Jun 12, 2015 at 10:06:06AM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > This patch routes a SA pathrecord query to netlink first and processes
> > the response appropriately. If a failure is returned, the request will
> > be sent through IB. The decision whether to route the request to
> > netlink first is determined by the presence of a listener for the
> > local service netlink multicast group. If the user-space local service
> > netlink multicast group listener is not present, the request will be
> > sent through IB, just like what is currently being done.
> 
> I'd really like to test this out here, I was hoping to get to that this week, but
> not yet.. And I was still hoping to read through carefully.
> 
> Anyhow, brief check seemed like things make sense

Thank you.

> 
> > +struct ib_nl_attr_info {
> > +	u16 len;	/* Total attr len: header + payload + padding */
> > +	ib_sa_comp_mask comp_mask;
> > +	void *input;
> > +	void (*set_attrs)(struct sk_buff *skb, struct ib_nl_attr_info
> > *info);
> 
> I don't really get what this is for, set_attrs is only ever equal to
> ib_nl_set_path_rec_attrs - is this whole indirection left over from something
> else? Drop it?

For this patch,  set_attrs is equal to ib_nl_set_path_rec_attrs. However, in the future, the same netlink mechanism could be easily extended to other SA querires, such as McMemberRecord. In that case, set_attrs will be set to another function. Alternatively, instead of differentiating the request type (pathrecord or other SA queries) in ib_nl_send_request(), we could do it in ib_nl_send_msg() under the spinlock. In that case, we need  neither set_attr field nor struct ib_nl_attr_info. However, we have to pass all input info into ib_nl_send_msg() with rinfo (struct ib_nl_request_info *), essentially moving the switch statements from ib_nl_send_request() into ib_nl_send_msg(). In that case, ib_nl_send_msg() may have to be renamed because it functions more like the current ib_nl_send_request(
 )...

Do you really prefer that? It's all about parameter passing.

> 
> This code does wander quite a bit, lots of functions are called from only one
> place, not necessarily bad, but at least topo sort them so the one-place
> called function is before the only caller... Makes it easier to read
> 
> > +	struct ib_sa_path_rec *sa_rec = info->input;
> > +	__u8 val1;
> > +	__u16 val2;
> > +	__u64 val3;
> 
> At least use val8, val16, val64 for names and IIRC, the __ is not used inside
> the kernel proper

I will use u8/u16/u64 instead and change the names accordingly.

Thank you for pointing it out.

> 
> > +
> > +	if ((info->comp_mask & IB_SA_PATH_REC_REVERSIBLE) &&
> > +	    sa_rec->reversible != 0) {
> > +		if ((info->comp_mask & IB_SA_PATH_REC_NUMB_PATH) &&
> > +		    sa_rec->numb_path > 1)
> 
> The kernel never sets numb path, I would just never set _ALL for now and
> leave it as a placeholder.

rdma_cm sets numb_path to 1. Ipoib and srp do not set it at all.  therefore, _ALL is never set.

Can we just leave the code here in case someone might want to set numb_path > 1?

> 
> > +			val1 = LS_NLA_PATH_USE_ALL;
> > +		else
> > +			val1 = LS_NLA_PATH_USE_GMP;
> > +	} else {
> > +		val1 = LS_NLA_PATH_USE_UNIDIRECTIONAL;
> > +	}
> > +	nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PATH_USE,
> sizeof(val1),
> > +		&val1);
> 
> This uses a u8 but other places are not using that:

Are you referring to u8 instead of __u8? I will replace __u8 with u8 in the netlink code in sa_query.c

> +    case LS_NLA_TYPE_PATH_USE:
> +    	  use = (struct rdma_nla_ls_path_use *) NLA_DATA(attr);
> 
> > +			if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
> > +				rec = nla_data(curr);
> > +				if (rec->flags && IB_PATH_PRIMARY) {
> 
> && is the wrong operator isn't it?

You are right, it should be "&".

> 
> IB_PATH_PRIMARY is the wrong test, I think I covered this already..

Could you be more specific? what do you think will be a good test here?

> 
> I still feel like the netlink specific stuff should live in its own file, not be
> jammed into sa_query.c - unless it is really hard to disentangle, but it doesn't
> look too bad at first glance..

It could be done. We need a header file for all netlink functions used by sa_query.c (to register /unregister the rdma netlink client, to cancel request, etc). In addition, we also need to have another header file from sa_query for those structures and functions used by the netlink code.  Of course, those functions can't be static anymore.  For these reasons, I prefer to keep them together.


> 
> > +static int ib_nl_get_path_rec_attrs_len(struct ib_nl_attr_info *info)
> > +{
> > +	if (len > 0)
> > +	   len += nla_total_size(sizeof(struct rdma_nla_ls_path_use));
> 
> Seems a bit goofy, I don't think we should ever generate an empty netlink
> query. That feels like a WARN_ON scenario.

I will use WARN_ON here.


> 
> > +	skb = nlmsg_new(attrs->len, GFP_KERNEL);
> > +	if (!skb) {
> > +	   pr_err("alloc failed ret=%d\n", ret);
> 
> Doesn't alloc failure already print enough?

I will remove the print.

> 
> > +	      if (ret != -ESRCH)
> > +	      	      pr_err("ibnl_multicast failed l=%d, r=%d\n",
> > +		      			     	           attrs->len,ret);
> 
> What is this print about? Seem strange?

I will remove the print.

Kaike

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-06-26 14:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-12 14:06 [PATCH v6 0/4] Sending kernel pathrecord query to user cache server kaike.wan-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1434117966-17978-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-12 14:06   ` [PATCH v6 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-12 14:06   ` [PATCH v6 2/4] IB/core: Check the presence of netlink multicast group listeners kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-12 14:06   ` [PATCH v6 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-06-12 14:06   ` [PATCH v6 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1434117966-17978-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-06-25 23:16       ` Jason Gunthorpe
     [not found]         ` <20150625231636.GA7901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-06-26 14:49           ` Wan, Kaike

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox