* [PATCH v8 0/4] Sending kernel pathrecord query to user cache server
@ 2015-07-09 17:34 kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-07-09 17:34 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan
From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
A SA cache is undeniably critical for fabric scalability and performance.
In user space, the ibacm application provides a good example of pathrecord
cache for address and route resolution. With the recent implementation of
the provider architecture, ibacm offers more extensibility as a SA cache.
In kernel, ipoib implements its own small cache for pathrecords, which is
however not available for general use. Furthermore, the implementation of
a SA cache in user space offers better flexibility, larger capacity, and
more robustness for the system.
In this patch series, a mechanism is implemented to allow ib_sa to
send pathrecord query to a user application (eg ibacm) through netlink.
Potentially, this mechanism could be easily extended to other SA queries.
With a customized test implemented in rdma_cm module (not included in this
series), it was shown that the time to retrieve 1 million pathrecords
dropped from 47053 jiffies (47.053 seconds) to 10339 jiffies (or 10.339
seconds) on a two-node system, a reduction of 78%.
Changes since v7:
-Patch 1:
- Replace RDMA_NL_SA with RDMA_NL_LS;
- Remove the defines for status attribute;
- Remove RDMA_NL_LS_F_OK;
- Remove a few structures for simple attribute data;
- Add the family header for RESOLVE request;
- Add comments about different attributes.
-Patch 2:
- Add a helper function to receive netlink responses;
- Modify ibnl_rcv_msg() to invoke the callback directly for netlink
response and the SET_TIMEOUT request instead of netlink_dump_start.
-Patch 4:
- Replace the netlink macros with static inline functions;
- Simplify the request path with fewer and direct function calls;
- Fold the netlink request structure into the ib_sa_query structure;
- Drop the numb_path comparison when determining path_use;
- Encode the RESOLVE family header when building the request;
- Determine the anticipated pathrecord data flags by path_use;
- Use nla_parse() to parse SET_TIMEOUT request message;
Changes since v6:
- Patch 4:
- Replace __u8/16/64 with u8/16/64;
- Remove the pathrecord flags testing when checking a netlink response;
- Remove a few error prints;
Changes since v5:
- Patch 1:
- Replace reversible and numb_path attributes with path_use attribute.
- Define Mandatory attribute flag.
- Define attribute data types in cpu byte order.
- Patch 4:
- Change the calculation of total attribute len;
- Modify the setting of attributes.
Changes since v4:
- Patch 1: rename LS_NLA_TYPE_NUM_PATH as LS_NLA_TYPE_NUMB_PATH.
- Patch 4: remove the renaming of LS_NLA_TYPE_NUM_PATH as
LS_NLA_TYPE_NUMB_PATH.
Changes since v3:
- Patch 1: add basic RESOLVE attribute types.
- Patch 4: change the encoding of the RESOLVE request message based on
the new attribute types and the input comp_mask. Change the response
handling by iterating all attributes.
Changes since v2:
- Redesigne the communication protocol between the kernel and user space
application. Instead of the MAD packet format, the new protocol uses
netlink message header and attributes to exchange request and
response between the kernel and user space.The design was described
here:
http://www.spinics.net/lists/linux-rdma/msg25621.html
Changes since v1:
- Move kzalloc changes into a separate patch (Patch 3).
- Remove redundant include line (Patch 4).
- Rename struct rdma_nl_resp_msg as structure ib_nl_resp_msg (Patch 4).
Kaike Wan (4):
IB/netlink: Add defines for local service requests through netlink
IB/core: Add rdma netlink helper functions
IB/sa: Allocate SA query with kzalloc
IB/sa: Route SA pathrecord query through netlink
drivers/infiniband/core/netlink.c | 55 +++++
drivers/infiniband/core/sa_query.c | 473 +++++++++++++++++++++++++++++++++++-
include/rdma/rdma_netlink.h | 7 +
include/uapi/rdma/rdma_netlink.h | 83 +++++++
4 files changed, 613 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-07-09 17:34 ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-2-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-09 17:34 ` [PATCH v8 2/4] IB/core: Add rdma netlink helper functions kaike.wan-ral2JQCrhuEAvxtiuMwx3w
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-07-09 17:34 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny
From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.
Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
include/uapi/rdma/rdma_netlink.h | 83 ++++++++++++++++++++++++++++++++++++++
1 files changed, 83 insertions(+), 0 deletions(-)
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 6e4bb42..97d5711 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -7,12 +7,14 @@ enum {
RDMA_NL_RDMA_CM = 1,
RDMA_NL_NES,
RDMA_NL_C4IW,
+ RDMA_NL_LS, /* RDMA Local Services */
RDMA_NL_NUM_CLIENTS
};
enum {
RDMA_NL_GROUP_CM = 1,
RDMA_NL_GROUP_IWPM,
+ RDMA_NL_GROUP_LS,
RDMA_NL_NUM_GROUPS
};
@@ -128,5 +130,86 @@ enum {
IWPM_NLA_ERR_MAX
};
+/*
+ * Local service operations:
+ * RESOLVE - The client requests the local service to resolve a path.
+ * SET_TIMEOUT - The local service requests the client to set the timeout.
+ */
+enum {
+ RDMA_NL_LS_OP_RESOLVE = 0,
+ RDMA_NL_LS_OP_SET_TIMEOUT,
+ RDMA_NL_LS_NUM_OPS
+};
+
+/* Local service netlink message flags */
+#define RDMA_NL_LS_F_ERR 0x0100 /* Failed response */
+
+/*
+ * Local service resolve operation family header.
+ * The layout for the resolve operation:
+ * nlmsg header
+ * family header
+ * attributes
+ */
+
+/*
+ * Local service path use:
+ * Specify how the path(s) will be used.
+ * ALL - For connected CM operation (6 pathrecords)
+ * UNIDIRECTIONAL - For unidirectional UD (1 pathrecord)
+ * GMP - For miscellaneous GMP like operation (at least 1 reversible
+ * pathrecord)
+ */
+enum {
+ LS_RESOLVE_PATH_USE_ALL = 0,
+ LS_RESOLVE_PATH_USE_UNIDIRECTIONAL,
+ LS_RESOLVE_PATH_USE_GMP,
+ LS_RESOLVE_PATH_USE_MAX
+};
+
+#define LS_DEVICE_NAME_MAX 64
+
+struct rdma_ls_resolve_header {
+ __u8 device_name[LS_DEVICE_NAME_MAX];
+ __u8 port_num;
+ __u8 path_use;
+};
+
+/* Local service attribute type */
+#define RDMA_NLA_F_MANDATORY (1 << 13)
+#define RDMA_NLA_TYPE_MASK (~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | \
+ RDMA_NLA_F_MANDATORY))
+
+/*
+ * Local service attributes:
+ * Attr Name Size Byte order
+ * -----------------------------------------------------
+ * STATUS u32 cpu
+ * PATH_RECORD struct ib_path_rec_data
+ * TIMEOUT u32 cpu
+ * SERVICE_ID u64 cpu
+ * DGID u8[16] BE
+ * SGID u8[16] BE
+ * TCLASS u8
+ * PKEY u16 cpu
+ * QOS_CLASS u16 cpu
+ */
+enum {
+ LS_NLA_TYPE_STATUS = 0,
+ LS_NLA_TYPE_PATH_RECORD,
+ LS_NLA_TYPE_TIMEOUT,
+ LS_NLA_TYPE_SERVICE_ID,
+ LS_NLA_TYPE_DGID,
+ LS_NLA_TYPE_SGID,
+ LS_NLA_TYPE_TCLASS,
+ LS_NLA_TYPE_PKEY,
+ LS_NLA_TYPE_QOS_CLASS,
+ LS_NLA_TYPE_MAX
+};
+
+/* Local service DGID/SGID attribute: big endian */
+struct rdma_nla_ls_gid {
+ __u8 gid[16];
+};
#endif /* _UAPI_RDMA_NETLINK_H */
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v8 2/4] IB/core: Add rdma netlink helper functions
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-09 17:34 ` [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-07-09 17:34 ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-3-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-09 17:34 ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-07-09 17:34 ` [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
3 siblings, 1 reply; 13+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-07-09 17:34 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny
From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.
Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/core/netlink.c | 55 +++++++++++++++++++++++++++++++++++++
include/rdma/rdma_netlink.h | 7 +++++
2 files changed, 62 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
index 23dd5a5..d47df93 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
static struct sock *nls;
static LIST_HEAD(client_list);
+int ibnl_chk_listeners(unsigned int group)
+{
+ if (netlink_has_listeners(nls, group) == 0)
+ return -1;
+ return 0;
+}
+EXPORT_SYMBOL(ibnl_chk_listeners);
+
int ibnl_add_client(int index, int nops,
const struct ibnl_client_cbs cb_table[])
{
@@ -151,6 +159,23 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
!client->cb_table[op].dump)
return -EINVAL;
+ /*
+ * For response or local service set_timeout request,
+ * there is no need to use netlink_dump_start.
+ */
+ if (!(nlh->nlmsg_flags & NLM_F_REQUEST) ||
+ (index == RDMA_NL_LS &&
+ op == RDMA_NL_LS_OP_SET_TIMEOUT)) {
+ struct netlink_callback cb = {
+ .skb = skb,
+ .nlh = nlh,
+ .dump = client->cb_table[op].dump,
+ .module = client->cb_table[op].module,
+ };
+
+ return cb.dump(skb, &cb);
+ }
+
{
struct netlink_dump_control c = {
.dump = client->cb_table[op].dump,
@@ -165,9 +190,39 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
}
+static void ibnl_rcv_reply_skb(struct sk_buff *skb)
+{
+ struct nlmsghdr *nlh;
+ int msglen;
+
+ /*
+ * Process responses until there is no more message or the first
+ * request. Generally speaking, it is not recommended to mix responses
+ * with requests.
+ */
+ while (skb->len >= nlmsg_total_size(0)) {
+ nlh = nlmsg_hdr(skb);
+
+ if (nlh->nlmsg_len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len)
+ return;
+
+ /* Handle response only */
+ if (nlh->nlmsg_flags & NLM_F_REQUEST)
+ return;
+
+ ibnl_rcv_msg(skb, nlh);
+
+ msglen = NLMSG_ALIGN(nlh->nlmsg_len);
+ if (msglen > skb->len)
+ msglen = skb->len;
+ skb_pull(skb, msglen);
+ }
+}
+
static void ibnl_rcv(struct sk_buff *skb)
{
mutex_lock(&ibnl_mutex);
+ ibnl_rcv_reply_skb(skb);
netlink_rcv_skb(skb, &ibnl_rcv_msg);
mutex_unlock(&ibnl_mutex);
}
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 0790882..5852661 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -77,4 +77,11 @@ int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
int ibnl_multicast(struct sk_buff *skb, struct nlmsghdr *nlh,
unsigned int group, gfp_t flags);
+/**
+ * Check if there are any listeners to the netlink group
+ * @group: the netlink group ID
+ * Returns 0 on success or a negative for no listeners.
+ */
+int ibnl_chk_listeners(unsigned int group);
+
#endif /* _RDMA_NETLINK_H */
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-09 17:34 ` [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-07-09 17:34 ` [PATCH v8 2/4] IB/core: Add rdma netlink helper functions kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-07-09 17:34 ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-07-09 17:34 ` [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
3 siblings, 0 replies; 13+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-07-09 17:34 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny
From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.
Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/core/sa_query.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 1f0d009..59022fa 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -739,7 +739,7 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
- query = kmalloc(sizeof *query, gfp_mask);
+ query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@@ -861,7 +861,7 @@ int ib_sa_service_rec_query(struct ib_sa_client *client,
method != IB_SA_METHOD_DELETE)
return -EINVAL;
- query = kmalloc(sizeof *query, gfp_mask);
+ query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@@ -953,7 +953,7 @@ int ib_sa_mcmember_rec_query(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
- query = kmalloc(sizeof *query, gfp_mask);
+ query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
@@ -1050,7 +1050,7 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
port = &sa_dev->port[port_num - sa_dev->start_port];
agent = port->agent;
- query = kmalloc(sizeof *query, gfp_mask);
+ query = kzalloc(sizeof(*query), gfp_mask);
if (!query)
return -ENOMEM;
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
` (2 preceding siblings ...)
2015-07-09 17:34 ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
@ 2015-07-09 17:34 ` kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
3 siblings, 1 reply; 13+ messages in thread
From: kaike.wan-ral2JQCrhuEAvxtiuMwx3w @ 2015-07-09 17:34 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Kaike Wan, John Fleck, Ira Weiny
From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.
Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/core/sa_query.c | 465 +++++++++++++++++++++++++++++++++++-
1 files changed, 464 insertions(+), 1 deletions(-)
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 59022fa..202bf08 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -45,12 +45,21 @@
#include <uapi/linux/if_ether.h>
#include <rdma/ib_pack.h>
#include <rdma/ib_cache.h>
+#include <rdma/rdma_netlink.h>
+#include <net/netlink.h>
+#include <uapi/rdma/ib_user_sa.h>
+#include <rdma/ib_marshall.h>
#include "sa.h"
MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("InfiniBand subnet administration query support");
MODULE_LICENSE("Dual BSD/GPL");
+#define IB_SA_LOCAL_SVC_TIMEOUT_MIN 100
+#define IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT 2000
+#define IB_SA_LOCAL_SVC_TIMEOUT_MAX 200000
+static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
+
struct ib_sa_sm_ah {
struct ib_ah *ah;
struct kref ref;
@@ -80,8 +89,16 @@ struct ib_sa_query {
struct ib_mad_send_buf *mad_buf;
struct ib_sa_sm_ah *sm_ah;
int id;
+ u32 flags;
+ struct list_head list; /* Local svc request list */
+ u32 seq; /* Local svc request sequence number */
+ unsigned long timeout; /* Local svc timeout */
+ u8 path_use; /* How will the pathrecord be used */
};
+#define IB_SA_ENABLE_LOCAL_SERVICE 0x00000001
+#define IB_SA_CANCEL 0x00000002
+
struct ib_sa_service_query {
void (*callback)(int, struct ib_sa_service_rec *, void *);
void *context;
@@ -106,6 +123,12 @@ struct ib_sa_mcmember_query {
struct ib_sa_query sa_query;
};
+static LIST_HEAD(ib_nl_request_list);
+static DEFINE_SPINLOCK(ib_nl_request_lock);
+static atomic_t ib_nl_sa_request_seq;
+static struct workqueue_struct *ib_nl_wq;
+static struct delayed_work ib_nl_timed_work;
+
static void ib_sa_add_one(struct ib_device *device);
static void ib_sa_remove_one(struct ib_device *device);
@@ -381,6 +404,405 @@ static const struct ib_field guidinfo_rec_table[] = {
.size_bits = 512 },
};
+static inline void ib_sa_disable_local_svc(struct ib_sa_query *query)
+{
+ query->flags &= ~IB_SA_ENABLE_LOCAL_SERVICE;
+}
+
+static inline int ib_sa_query_cancelled(struct ib_sa_query *query)
+{
+ return (query->flags & IB_SA_CANCEL);
+}
+
+static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
+ struct ib_sa_query *query)
+{
+ struct ib_sa_path_rec *sa_rec = query->mad_buf->context[1];
+ struct ib_sa_mad *mad = query->mad_buf->mad;
+ ib_sa_comp_mask comp_mask = mad->sa_hdr.comp_mask;
+ u16 val16;
+ u64 val64;
+ struct rdma_ls_resolve_header *header;
+
+ query->mad_buf->context[1] = NULL;
+
+ /* Construct the family header first */
+ header = (struct rdma_ls_resolve_header *)
+ skb_put(skb, NLMSG_ALIGN(sizeof(*header)));
+ memcpy(header->device_name, query->port->agent->device->name,
+ LS_DEVICE_NAME_MAX);
+ header->port_num = query->port->port_num;
+
+ if ((comp_mask & IB_SA_PATH_REC_REVERSIBLE) &&
+ sa_rec->reversible != 0)
+ query->path_use = LS_RESOLVE_PATH_USE_GMP;
+ else
+ query->path_use = LS_RESOLVE_PATH_USE_UNIDIRECTIONAL;
+ header->path_use = query->path_use;
+
+ /* Now build the attributes */
+ if (comp_mask & IB_SA_PATH_REC_SERVICE_ID) {
+ val64 = be64_to_cpu(sa_rec->service_id);
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SERVICE_ID,
+ sizeof(val64), &val64);
+ }
+ if (comp_mask & IB_SA_PATH_REC_DGID)
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_DGID,
+ sizeof(sa_rec->dgid), &sa_rec->dgid);
+ if (comp_mask & IB_SA_PATH_REC_SGID)
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_SGID,
+ sizeof(sa_rec->sgid), &sa_rec->sgid);
+ if (comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_TCLASS,
+ sizeof(sa_rec->traffic_class), &sa_rec->traffic_class);
+
+ if (comp_mask & IB_SA_PATH_REC_PKEY) {
+ val16 = be16_to_cpu(sa_rec->pkey);
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_PKEY,
+ sizeof(val16), &val16);
+ }
+ if (comp_mask & IB_SA_PATH_REC_QOS_CLASS) {
+ val16 = be16_to_cpu(sa_rec->qos_class);
+ nla_put(skb, RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_QOS_CLASS,
+ sizeof(val16), &val16);
+ }
+}
+
+static int ib_nl_get_path_rec_attrs_len(ib_sa_comp_mask comp_mask)
+{
+ int len = 0;
+
+ if (comp_mask & IB_SA_PATH_REC_SERVICE_ID)
+ len += nla_total_size(sizeof(u64));
+ if (comp_mask & IB_SA_PATH_REC_DGID)
+ len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
+ if (comp_mask & IB_SA_PATH_REC_SGID)
+ len += nla_total_size(sizeof(struct rdma_nla_ls_gid));
+ if (comp_mask & IB_SA_PATH_REC_TRAFFIC_CLASS)
+ len += nla_total_size(sizeof(u8));
+ if (comp_mask & IB_SA_PATH_REC_PKEY)
+ len += nla_total_size(sizeof(u16));
+ if (comp_mask & IB_SA_PATH_REC_QOS_CLASS)
+ len += nla_total_size(sizeof(u16));
+
+ /*
+ * Make sure that at least some of the required comp_mask bits are
+ * set.
+ */
+ if (WARN_ON(len == 0))
+ return len;
+
+ /* Add the family header */
+ len += NLMSG_ALIGN(sizeof(struct rdma_ls_resolve_header));
+
+ return len;
+}
+
+static int ib_nl_send_msg(struct ib_sa_query *query)
+{
+ struct sk_buff *skb = NULL;
+ struct nlmsghdr *nlh;
+ void *data;
+ int ret = 0;
+ struct ib_sa_mad *mad;
+ int len;
+
+ mad = query->mad_buf->mad;
+ len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
+ if (len <= 0)
+ return -EMSGSIZE;
+
+ skb = nlmsg_new(len, GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+
+ /* Put nlmsg header only for now */
+ data = ibnl_put_msg(skb, &nlh, query->seq, 0, RDMA_NL_LS,
+ RDMA_NL_LS_OP_RESOLVE, GFP_KERNEL);
+ if (!data) {
+ kfree_skb(skb);
+ return -EMSGSIZE;
+ }
+
+ /* Add attributes */
+ ib_nl_set_path_rec_attrs(skb, query);
+
+ /* Repair the nlmsg header length */
+ nlmsg_end(skb, nlh);
+
+ ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
+ if (!ret)
+ ret = len;
+ else
+ ret = 0;
+
+ return ret;
+}
+
+static int ib_nl_make_request(struct ib_sa_query *query)
+{
+ unsigned long flags;
+ unsigned long delay;
+ int ret;
+
+ INIT_LIST_HEAD(&query->list);
+ query->seq = (u32)atomic_inc_return(&ib_nl_sa_request_seq);
+
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ ret = ib_nl_send_msg(query);
+ if (ret <= 0) {
+ ret = -EIO;
+ goto request_out;
+ } else {
+ ret = 0;
+ }
+
+ delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+ query->timeout = delay + jiffies;
+ list_add_tail(&query->list, &ib_nl_request_list);
+ /* Start the timeout if this is the only request */
+ if (ib_nl_request_list.next == &query->list)
+ queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+
+request_out:
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+ return ret;
+}
+
+static int ib_nl_cancel_request(struct ib_sa_query *query)
+{
+ unsigned long flags;
+ struct ib_sa_query *wait_query;
+ int found = 0;
+
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ list_for_each_entry(wait_query, &ib_nl_request_list, list) {
+ /* Let the timeout to take care of the callback */
+ if (query == wait_query) {
+ query->flags |= IB_SA_CANCEL;
+ query->timeout = jiffies;
+ list_move(&query->list, &ib_nl_request_list);
+ found = 1;
+ mod_delayed_work(ib_nl_wq, &ib_nl_timed_work, 1);
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+
+ return found;
+}
+
+static void send_handler(struct ib_mad_agent *agent,
+ struct ib_mad_send_wc *mad_send_wc);
+
+static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
+ const struct nlmsghdr *nlh)
+{
+ struct ib_mad_send_wc mad_send_wc;
+ struct ib_sa_mad *mad = NULL;
+ const struct nlattr *head, *curr;
+ struct ib_path_rec_data *rec;
+ int len, rem;
+ u32 mask = 0;
+ int status = -EIO;
+
+ if (query->callback) {
+ head = (const struct nlattr *) nlmsg_data(nlh);
+ len = nlmsg_len(nlh);
+ switch (query->path_use) {
+ case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL:
+ mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND;
+ break;
+
+ case LS_RESOLVE_PATH_USE_ALL:
+ case LS_RESOLVE_PATH_USE_GMP:
+ default:
+ mask = IB_PATH_PRIMARY | IB_PATH_GMP |
+ IB_PATH_BIDIRECTIONAL;
+ break;
+ }
+ nla_for_each_attr(curr, head, len, rem) {
+ if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
+ rec = nla_data(curr);
+ /*
+ * Get the first one. In the future, we may
+ * need to get up to 6 pathrecords.
+ */
+ if (rec->flags & mask) {
+ mad = query->mad_buf->mad;
+ mad->mad_hdr.method |=
+ IB_MGMT_METHOD_RESP;
+ memcpy(mad->data, rec->path_rec,
+ sizeof(rec->path_rec));
+ status = 0;
+ break;
+ }
+ }
+ }
+ query->callback(query, status, mad);
+ }
+
+ mad_send_wc.send_buf = query->mad_buf;
+ mad_send_wc.status = IB_WC_SUCCESS;
+ send_handler(query->mad_buf->mad_agent, &mad_send_wc);
+}
+
+static void ib_nl_request_timeout(struct work_struct *work)
+{
+ unsigned long flags;
+ struct ib_sa_query *query;
+ unsigned long delay;
+ struct ib_mad_send_wc mad_send_wc;
+ int ret;
+
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ while (!list_empty(&ib_nl_request_list)) {
+ query = list_entry(ib_nl_request_list.next,
+ struct ib_sa_query, list);
+
+ if (time_after(query->timeout, jiffies)) {
+ delay = query->timeout - jiffies;
+ if ((long)delay <= 0)
+ delay = 1;
+ queue_delayed_work(ib_nl_wq, &ib_nl_timed_work, delay);
+ break;
+ }
+
+ list_del(&query->list);
+ ib_sa_disable_local_svc(query);
+ /* Hold the lock to protect against query cancellation */
+ if (ib_sa_query_cancelled(query))
+ ret = -1;
+ else
+ ret = ib_post_send_mad(query->mad_buf, NULL);
+ if (ret) {
+ mad_send_wc.send_buf = query->mad_buf;
+ mad_send_wc.status = IB_WC_WR_FLUSH_ERR;
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+ send_handler(query->port->agent, &mad_send_wc);
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ }
+ }
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+}
+
+static int ib_nl_handle_set_timeout(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+ int timeout, delta, abs_delta;
+ const struct nlattr *attr;
+ unsigned long flags;
+ struct ib_sa_query *query;
+ long delay = 0;
+ struct nlattr *tb[LS_NLA_TYPE_MAX + 1];
+ int ret;
+
+ ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh), nlmsg_len(nlh),
+ NULL);
+ attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
+ if (ret || !attr || nla_len(attr) != sizeof(u32))
+ goto settimeout_out;
+
+ timeout = *(int *) nla_data(attr);
+ if (timeout < IB_SA_LOCAL_SVC_TIMEOUT_MIN)
+ timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN;
+ if (timeout > IB_SA_LOCAL_SVC_TIMEOUT_MAX)
+ timeout = IB_SA_LOCAL_SVC_TIMEOUT_MAX;
+
+ delta = timeout - sa_local_svc_timeout_ms;
+ if (delta < 0)
+ abs_delta = -delta;
+ else
+ abs_delta = delta;
+
+ if (delta != 0) {
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ sa_local_svc_timeout_ms = timeout;
+ list_for_each_entry(query, &ib_nl_request_list, list) {
+ if (delta < 0 && abs_delta > query->timeout)
+ query->timeout = 0;
+ else
+ query->timeout += delta;
+
+ /* Get the new delay from the first entry */
+ if (!delay) {
+ delay = query->timeout - jiffies;
+ if (delay <= 0)
+ delay = 1;
+ }
+ }
+ if (delay)
+ mod_delayed_work(ib_nl_wq, &ib_nl_timed_work,
+ (unsigned long)delay);
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+ }
+
+settimeout_out:
+ return skb->len;
+}
+
+static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
+ unsigned long flags;
+ struct ib_sa_query *query;
+ struct ib_mad_send_buf *send_buf;
+ struct ib_mad_send_wc mad_send_wc;
+ int found = 0;
+ int ret;
+
+ spin_lock_irqsave(&ib_nl_request_lock, flags);
+ list_for_each_entry(query, &ib_nl_request_list, list) {
+ /*
+ * If the query is cancelled, let the timeout routine
+ * take care of it.
+ */
+ if (nlh->nlmsg_seq == query->seq) {
+ found = !ib_sa_query_cancelled(query);
+ if (found)
+ list_del(&query->list);
+ break;
+ }
+ }
+
+ if (!found) {
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+ goto resp_out;
+ }
+
+ send_buf = query->mad_buf;
+
+ if (nlh->nlmsg_flags & RDMA_NL_LS_F_ERR) {
+ /* if the result is a failure, send out the packet via IB */
+ ib_sa_disable_local_svc(query);
+ ret = ib_post_send_mad(query->mad_buf, NULL);
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+ if (ret) {
+ mad_send_wc.send_buf = send_buf;
+ mad_send_wc.status = IB_WC_GENERAL_ERR;
+ send_handler(query->port->agent, &mad_send_wc);
+ }
+ } else {
+ spin_unlock_irqrestore(&ib_nl_request_lock, flags);
+ ib_nl_process_good_resolve_rsp(query, nlh);
+ }
+
+resp_out:
+ return skb->len;
+}
+
+static struct ibnl_client_cbs ib_sa_cb_table[] = {
+ [RDMA_NL_LS_OP_RESOLVE] = {
+ .dump = ib_nl_handle_resolve_resp,
+ .module = THIS_MODULE },
+ [RDMA_NL_LS_OP_SET_TIMEOUT] = {
+ .dump = ib_nl_handle_set_timeout,
+ .module = THIS_MODULE },
+};
+
static void free_sm_ah(struct kref *kref)
{
struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref);
@@ -502,7 +924,13 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query)
mad_buf = query->mad_buf;
spin_unlock_irqrestore(&idr_lock, flags);
- ib_cancel_mad(agent, mad_buf);
+ /*
+ * If the query is still on the netlink request list, schedule
+ * it to be cancelled by the timeout routine. Otherwise, it has been
+ * sent to the MAD layer and has to be cancelled from there.
+ */
+ if (!ib_nl_cancel_request(query))
+ ib_cancel_mad(agent, mad_buf);
}
EXPORT_SYMBOL(ib_sa_cancel_query);
@@ -638,6 +1066,14 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
query->mad_buf->context[0] = query;
query->id = id;
+ if (query->flags & IB_SA_ENABLE_LOCAL_SERVICE) {
+ if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
+ if (!ib_nl_make_request(query))
+ return id;
+ }
+ ib_sa_disable_local_svc(query);
+ }
+
ret = ib_post_send_mad(query->mad_buf, NULL);
if (ret) {
spin_lock_irqsave(&idr_lock, flags);
@@ -766,6 +1202,9 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
*sa_query = &query->sa_query;
+ query->sa_query.flags |= IB_SA_ENABLE_LOCAL_SERVICE;
+ query->sa_query.mad_buf->context[1] = rec;
+
ret = send_mad(&query->sa_query, timeout_ms, gfp_mask);
if (ret < 0)
goto err2;
@@ -1254,6 +1693,8 @@ static int __init ib_sa_init(void)
get_random_bytes(&tid, sizeof tid);
+ atomic_set(&ib_nl_sa_request_seq, 0);
+
ret = ib_register_client(&sa_client);
if (ret) {
printk(KERN_ERR "Couldn't register ib_sa client\n");
@@ -1266,7 +1707,25 @@ static int __init ib_sa_init(void)
goto err2;
}
+ ib_nl_wq = create_singlethread_workqueue("ib_nl_sa_wq");
+ if (!ib_nl_wq) {
+ ret = -ENOMEM;
+ goto err3;
+ }
+
+ if (ibnl_add_client(RDMA_NL_LS, RDMA_NL_LS_NUM_OPS,
+ ib_sa_cb_table)) {
+ pr_err("Failed to add netlink callback\n");
+ ret = -EINVAL;
+ goto err4;
+ }
+ INIT_DELAYED_WORK(&ib_nl_timed_work, ib_nl_request_timeout);
+
return 0;
+err4:
+ destroy_workqueue(ib_nl_wq);
+err3:
+ mcast_cleanup();
err2:
ib_unregister_client(&sa_client);
err1:
@@ -1275,6 +1734,10 @@ err1:
static void __exit ib_sa_cleanup(void)
{
+ ibnl_remove_client(RDMA_NL_LS);
+ cancel_delayed_work(&ib_nl_timed_work);
+ flush_workqueue(ib_nl_wq);
+ destroy_workqueue(ib_nl_wq);
mcast_cleanup();
ib_unregister_client(&sa_client);
idr_destroy(&query_idr);
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink
[not found] ` <1436463268-32365-2-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-07-22 20:55 ` Jason Gunthorpe
[not found] ` <20150722205557.GA20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 20:55 UTC (permalink / raw)
To: kaike.wan-ral2JQCrhuEAvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck, Ira Weiny
On Thu, Jul 09, 2015 at 01:34:25PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> + LS_NLA_TYPE_STATUS = 0,
This is never used, drop it.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
[not found] ` <1436463268-32365-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-07-22 21:09 ` Jason Gunthorpe
[not found] ` <20150722210918.GB20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 21:09 UTC (permalink / raw)
To: kaike.wan-ral2JQCrhuEAvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck, Ira Weiny
On Thu, Jul 09, 2015 at 01:34:28PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> This patch routes a SA pathrecord query to netlink first and processes the
> response appropriately. If a failure is returned, the request will be sent
> through IB. The decision whether to route the request to netlink first is
> determined by the presence of a listener for the local service netlink
> multicast group. If the user-space local service netlink multicast group
> listener is not present, the request will be sent through IB, just like
> what is currently being done.
So, I tested it again, and the UAPI looks pretty reasonable now.
The netlink validation still needs to be fixed.
> There is where we might have a problem: nla_parse() allows only one
> entry for each attribute type in the returned array tb[]. If we have
> multiple (say 6) pathrecords returned, each having different flags
> (for different path use), a later one will replace an earlier one in
> tb[].
The parse is OK, continue to use the for loop, nla_parse does more
than just extract into tb, it validates all the sizes are correct,
ignore tb.
My testing shows it seems to get into some kind of fast repeating
query loop:
[ 4904.969188] Return answer 0
[ 4904.969483] Return answer 0
[ 4904.969703] Return answer 0
[ 4904.969824] Return answer 0
[ 4904.969943] Return answer 0
[ 4904.970058] Return answer 0
[ 4904.970175] Return answer 0
[ 4904.970289] Return answer 0
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 63ea36d05072..e6b0181e7076 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -640,6 +640,8 @@ static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
}
}
}
+
+ printk("Return answer %u\n",status);
query->callback(query, status, mad);
}
You'll have to figure that out. I'm just doing ping X while running
a responder on the netlink socket, v7 didn't do this, IIRC.
I think it has something to do with the timer as the first request
works fine, then a pause, then a storm.
Actually, looks like nothing removes nl queries from the timeout loop
when they successfully complete. (ie this series doesn't even have a
hope of working properly)
Please actually test this patch completely and thoroughly before
sending another version. I'm broadly happy with this, so get your
whole team to look it over agin.
> + if (query->callback) {
> + head = (const struct nlattr *) nlmsg_data(nlh);
> + len = nlmsg_len(nlh);
> + switch (query->path_use) {
> + case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL:
> + mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND;
> + break;
> +
> + case LS_RESOLVE_PATH_USE_ALL:
> + case LS_RESOLVE_PATH_USE_GMP:
> + default:
> + mask = IB_PATH_PRIMARY | IB_PATH_GMP |
> + IB_PATH_BIDIRECTIONAL;
> + break;
> + }
> + nla_for_each_attr(curr, head, len, rem) {
> + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
> + rec = nla_data(curr);
> + /*
> + * Get the first one. In the future, we may
> + * need to get up to 6 pathrecords.
> + */
> + if (rec->flags & mask) {
(rec_>flags & mask) == mask
All requested bits must be set, not just any requested bit.
A IB_PATH_PRIMARY | IB_PATH_OUTBOUND result does not satisfy the
requirement for LS_RESOLVE_PATH_USE_GMP.
The goal is to local the one path of many that satisfies the
requirement. Future kernels will direct 6 path answers
> +static int ib_nl_handle_set_timeout(struct sk_buff *skb,
> + struct netlink_callback *cb)
> +{
> + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
> + int timeout, delta, abs_delta;
> + const struct nlattr *attr;
> + unsigned long flags;
> + struct ib_sa_query *query;
> + long delay = 0;
> + struct nlattr *tb[LS_NLA_TYPE_MAX + 1];
> + int ret;
> +
> + ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh), nlmsg_len(nlh),
> + NULL);
> + attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
> + if (ret || !attr || nla_len(attr) != sizeof(u32))
> + goto settimeout_out;
This probably doesn't need the nested attribute, but I'm indifferent.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink
[not found] ` <20150722205557.GA20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 12:41 ` Wan, Kaike
0 siblings, 0 replies; 13+ messages in thread
From: Wan, Kaike @ 2015-07-23 12:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fleck, John,
Weiny, Ira
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Wednesday, July 22, 2015 4:56 PM
> To: Wan, Kaike
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Fleck, John; Weiny, Ira
> Subject: Re: [PATCH v8 1/4] IB/netlink: Add defines for local service requests
> through netlink
>
> On Thu, Jul 09, 2015 at 01:34:25PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > + LS_NLA_TYPE_STATUS = 0,
>
> This is never used, drop it.
Done.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
[not found] ` <20150722210918.GB20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 19:26 ` Wan, Kaike
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CB55290-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Wan, Kaike @ 2015-07-23 19:26 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fleck, John,
Weiny, Ira
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Wednesday, July 22, 2015 5:09 PM
> To: Wan, Kaike
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Fleck, John; Weiny, Ira
> Subject: Re: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
>
> On Thu, Jul 09, 2015 at 01:34:28PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > This patch routes a SA pathrecord query to netlink first and processes
> > the response appropriately. If a failure is returned, the request will
> > be sent through IB. The decision whether to route the request to
> > netlink first is determined by the presence of a listener for the
> > local service netlink multicast group. If the user-space local service
> > netlink multicast group listener is not present, the request will be
> > sent through IB, just like what is currently being done.
>
> So, I tested it again, and the UAPI looks pretty reasonable now.
>
> The netlink validation still needs to be fixed.
>
> > There is where we might have a problem: nla_parse() allows only one
> > entry for each attribute type in the returned array tb[]. If we have
> > multiple (say 6) pathrecords returned, each having different flags
> > (for different path use), a later one will replace an earlier one in
> > tb[].
>
> The parse is OK, continue to use the for loop, nla_parse does more than just
> extract into tb, it validates all the sizes are correct, ignore tb.
>
> My testing shows it seems to get into some kind of fast repeating query loop:
>
> [ 4904.969188] Return answer 0
> [ 4904.969483] Return answer 0
> [ 4904.969703] Return answer 0
> [ 4904.969824] Return answer 0
> [ 4904.969943] Return answer 0
> [ 4904.970058] Return answer 0
> [ 4904.970175] Return answer 0
> [ 4904.970289] Return answer 0
>
> diff --git a/drivers/infiniband/core/sa_query.c
> b/drivers/infiniband/core/sa_query.c
> index 63ea36d05072..e6b0181e7076 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -640,6 +640,8 @@ static void ib_nl_process_good_resolve_rsp(struct
> ib_sa_query *query,
> }
> }
> }
> +
> + printk("Return answer %u\n",status);
> query->callback(query, status, mad);
> }
>
> You'll have to figure that out. I'm just doing ping X while running a responder
> on the netlink socket, v7 didn't do this, IIRC.
I did not see such a repeated query loop. With a modified version of your debugging code:
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -653,6 +653,7 @@ static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query,
}
}
}
+ printk("Request %u returns answer %u\n", nlh->nlmsg_seq);
query->callback(query, status, mad);
}
I ran rping a few times and never saw such a behavior:
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 2 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 3 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 4 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 5 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]#
The nl sequence increased sequentially. What's the difference?
In my debugging patch, I have also other print statements and I have never seen such a loop at all.
>
> I think it has something to do with the timer as the first request works fine,
> then a pause, then a storm.
>
> Actually, looks like nothing removes nl queries from the timeout loop when
> they successfully complete. (ie this series doesn't even have a hope of
> working properly)
That is incorrect. The first thing in ib_nl_handle_resolve_resp is to find the request and remove it from the ib_nl_rquest_list. the timeout routine will remove the entries that have timed out from the nl request list.
>
> Please actually test this patch completely and thoroughly before sending
> another version.
Generally, I tested the patches (along with a debugging patch) with rping, ping, SRP, and another performance path (no debugging patch). I also tested another ibacm patch for setting timeout. When making major changes, I also ran tests for timeout and error response cases.
>I'm broadly happy with this, so get your whole team to look
> it over agin.
>
> > + if (query->callback) {
> > + head = (const struct nlattr *) nlmsg_data(nlh);
> > + len = nlmsg_len(nlh);
> > + switch (query->path_use) {
> > + case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL:
> > + mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND;
> > + break;
> > +
> > + case LS_RESOLVE_PATH_USE_ALL:
> > + case LS_RESOLVE_PATH_USE_GMP:
> > + default:
> > + mask = IB_PATH_PRIMARY | IB_PATH_GMP |
> > + IB_PATH_BIDIRECTIONAL;
> > + break;
> > + }
> > + nla_for_each_attr(curr, head, len, rem) {
> > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
> > + rec = nla_data(curr);
> > + /*
> > + * Get the first one. In the future, we may
> > + * need to get up to 6 pathrecords.
> > + */
> > + if (rec->flags & mask) {
>
> (rec_>flags & mask) == mask
>
> All requested bits must be set, not just any requested bit.
>
> A IB_PATH_PRIMARY | IB_PATH_OUTBOUND result does not satisfy the
> requirement for LS_RESOLVE_PATH_USE_GMP.
>
> The goal is to local the one path of many that satisfies the requirement.
> Future kernels will direct 6 path answers
Got it.
>
> > +static int ib_nl_handle_set_timeout(struct sk_buff *skb,
> > + struct netlink_callback *cb)
> > +{
> > + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
> > + int timeout, delta, abs_delta;
> > + const struct nlattr *attr;
> > + unsigned long flags;
> > + struct ib_sa_query *query;
> > + long delay = 0;
> > + struct nlattr *tb[LS_NLA_TYPE_MAX + 1];
> > + int ret;
> > +
> > + ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh),
> nlmsg_len(nlh),
> > + NULL);
> > + attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
> > + if (ret || !attr || nla_len(attr) != sizeof(u32))
> > + goto settimeout_out;
>
> This probably doesn't need the nested attribute, but I'm indifferent.
It's not a nested attribute, only a U32 attribute data. I will use a policy to validate the entire response (for pathrecord also).
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CB55290-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-07-23 20:14 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 20:14 UTC (permalink / raw)
To: Wan, Kaike
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fleck, John,
Weiny, Ira
On Thu, Jul 23, 2015 at 07:26:32PM +0000, Wan, Kaike wrote:
> The nl sequence increased sequentially. What's the difference?
Hmm, looks like in my version the DGID was getting mangled and this
causes ipoib to enter a tight query loop..
Not your problem.
> > Actually, looks like nothing removes nl queries from the timeout loop when
> > they successfully complete. (ie this series doesn't even have a hope of
> > working properly)
>
> That is incorrect. The first thing in ib_nl_handle_resolve_resp is
> to find the request and remove it from the ib_nl_rquest_list. the
> timeout routine will remove the entries that have timed out from the
> nl request list.
Yep, okay, I see it now.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v8 2/4] IB/core: Add rdma netlink helper functions
[not found] ` <1436463268-32365-3-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2015-08-04 3:15 ` Jason Gunthorpe
[not found] ` <20150804031534.GA28707-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2015-08-04 3:15 UTC (permalink / raw)
To: kaike.wan-ral2JQCrhuEAvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck, Ira Weiny
On Thu, Jul 09, 2015 at 01:34:26PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> This patch adds a function to check if listeners for a netlink multicast
> group are present. It also adds a function to receive netlink response
> messages.
>
> Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> drivers/infiniband/core/netlink.c | 55 +++++++++++++++++++++++++++++++++++++
> include/rdma/rdma_netlink.h | 7 +++++
> 2 files changed, 62 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
> index 23dd5a5..d47df93 100644
> +++ b/drivers/infiniband/core/netlink.c
> @@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
> static struct sock *nls;
> static LIST_HEAD(client_list);
>
> +int ibnl_chk_listeners(unsigned int group)
> +{
> + if (netlink_has_listeners(nls, group) == 0)
> + return -1;
> + return 0;
> +}
> +EXPORT_SYMBOL(ibnl_chk_listeners);
I was thinking about this today, and, where is the security?
What prevents a non-root user from making the above true and/or worse?
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v8 2/4] IB/core: Add rdma netlink helper functions
[not found] ` <20150804031534.GA28707-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-08-05 0:48 ` ira.weiny
[not found] ` <20150805004830.GA23589-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: ira.weiny @ 2015-08-05 0:48 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kaike.wan-ral2JQCrhuEAvxtiuMwx3w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck
On Mon, Aug 03, 2015 at 09:15:34PM -0600, Jason Gunthorpe wrote:
> On Thu, Jul 09, 2015 at 01:34:26PM -0400, kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:
> > From: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >
> > This patch adds a function to check if listeners for a netlink multicast
> > group are present. It also adds a function to receive netlink response
> > messages.
> >
> > Signed-off-by: Kaike Wan <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Signed-off-by: John Fleck <john.fleck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > drivers/infiniband/core/netlink.c | 55 +++++++++++++++++++++++++++++++++++++
> > include/rdma/rdma_netlink.h | 7 +++++
> > 2 files changed, 62 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
> > index 23dd5a5..d47df93 100644
> > +++ b/drivers/infiniband/core/netlink.c
> > @@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
> > static struct sock *nls;
> > static LIST_HEAD(client_list);
> >
> > +int ibnl_chk_listeners(unsigned int group)
> > +{
> > + if (netlink_has_listeners(nls, group) == 0)
> > + return -1;
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(ibnl_chk_listeners);
>
> I was thinking about this today, and, where is the security?
>
> What prevents a non-root user from making the above true and/or worse?
We are using Netlink multicast. I believe that netlink_bind only allows root
to bind to multicast.
static int netlink_bind(struct socket *sock, struct sockaddr *addr,
int addr_len)
{
...
/* Only superuser is allowed to listen multicasts */
if (groups) {
if (!netlink_allowed(sock, NL_CFG_F_NONROOT_RECV))
return -EPERM;
err = netlink_realloc_groups(sk);
if (err)
return err;
}
...
That said I have not tested the ability to change the timeout settings if one
were to bind without multicast and send a message.
I'll see if I can get some time to test this as Kaike is out on vacation.
Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v8 2/4] IB/core: Add rdma netlink helper functions
[not found] ` <20150805004830.GA23589-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
@ 2015-08-05 0:52 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2015-08-05 0:52 UTC (permalink / raw)
To: ira.weiny
Cc: kaike.wan-ral2JQCrhuEAvxtiuMwx3w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, John Fleck
On Tue, Aug 04, 2015 at 08:48:31PM -0400, ira.weiny wrote:
> We are using Netlink multicast. I believe that netlink_bind only allows root
> to bind to multicast.
That is a good start...
> That said I have not tested the ability to change the timeout settings if one
> were to bind without multicast and send a message.
I rather expect that needs fixing..
And I bet sequence numbers are sufficiently predictable that the path
reply should be restricted to root as well.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-08-05 0:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-09 17:34 [PATCH v8 0/4] Sending kernel pathrecord query to user cache server kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-1-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-09 17:34 ` [PATCH v8 1/4] IB/netlink: Add defines for local service requests through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-2-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-22 20:55 ` Jason Gunthorpe
[not found] ` <20150722205557.GA20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 12:41 ` Wan, Kaike
2015-07-09 17:34 ` [PATCH v8 2/4] IB/core: Add rdma netlink helper functions kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-3-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-08-04 3:15 ` Jason Gunthorpe
[not found] ` <20150804031534.GA28707-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-08-05 0:48 ` ira.weiny
[not found] ` <20150805004830.GA23589-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-08-05 0:52 ` Jason Gunthorpe
2015-07-09 17:34 ` [PATCH v3 3/4] IB/sa: Allocate SA query with kzalloc kaike.wan-ral2JQCrhuEAvxtiuMwx3w
2015-07-09 17:34 ` [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink kaike.wan-ral2JQCrhuEAvxtiuMwx3w
[not found] ` <1436463268-32365-5-git-send-email-kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-07-22 21:09 ` Jason Gunthorpe
[not found] ` <20150722210918.GB20815-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 19:26 ` Wan, Kaike
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CB55290-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-07-23 20:14 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).