Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: Enabling peer to peer device transactions for PCIe devices
From: Dan Williams @ 2016-11-22 20:01 UTC (permalink / raw)
  To: Serguei Sagalovitch
  Cc: Deucher, Alexander, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org,
	Linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-pci@vger.kernel.org, Bridgman, John, Kuehling, Felix,
	Blinzer, Paul, Koenig, Christian, Suthikulpanit, Suravee,
	Sander, Ben, Dave Hansen
In-Reply-To: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>

On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
<serguei.sagalovitch@amd.com> wrote:
> Dan,
>
> I personally like "device-DAX" idea but my concerns are:
>
> -  How well it will co-exists with the  DRM infrastructure / implementations
>    in part dealing with CPU pointers?

Inside the kernel a device-DAX range is "just memory" in the sense
that you can perform pfn_to_page() on it and issue I/O, but the vma is
not migratable. To be honest I do not know how well that co-exists
with drm infrastructure.

> -  How well we will be able to handle case when we need to "move"/"evict"
>    memory/data to the new location so CPU pointer should point to the new
> physical location/address
>     (and may be not in PCI device memory at all)?

So, device-DAX deliberately avoids support for in-kernel migration or
overcommit. Those cases are left to the core mm or drm. The device-dax
interface is for cases where all that is needed is a direct-mapping to
a statically-allocated physical-address range be it persistent memory
or some other special reserved memory range.

^ permalink raw reply

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-22 19:49 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma, netdev, Dennis Dalessandro
In-Reply-To: <20161122170407.GE3956@obsidianresearch.com>

Ok, I do understand Jason's point that we should probably not put this driver 
under drivers/infiniband/sw/.., as this driver is not a HCA.
It is an ULP similar to ipoib, built on top of Omni-path irrespective of 
whether we register a hfi_vnic_bus or a direct custom interface with HFI1.
This ULP will transmit and recieve Omni-path packets over the fabric, and is 
dependent on IB MAD interface and the HFI1 driver.

Doug,
Will it be acceptable if we put it under 'drivers/infiniband/ulp/hfi_vnic'?

Niranjana

^ permalink raw reply

* [PATCH v5 9/9] selinux: Add a cache for quicker retreival of PKey SIDs
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
	sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
  Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

It is likely that the SID for the same PKey will be requested many
times. To reduce the time to modify QPs and process MADs use a cache to
store PKey SIDs.

This code is heavily based on the "netif" and "netport" concept
originally developed by James Morris <jmorris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> and Paul Moore
<paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org> (see security/selinux/netif.c and
security/selinux/netport.c for more information)

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
v2:
- Renamed the files to ibpkey. Paul Moore
- Fixed a braket indentation mismatch in sel_pkey_find. Yuval Shaia
- Change spin_lock_bh to spin_lock_irqsave to resolve HARDIRQ lockdep
  warning.  Dan Jurgens
---
 security/selinux/Makefile         |   2 +-
 security/selinux/hooks.c          |   7 +-
 security/selinux/ibpkey.c         | 245 ++++++++++++++++++++++++++++++++++++++
 security/selinux/include/ibpkey.h |  31 +++++
 security/selinux/include/objsec.h |   6 +
 5 files changed, 288 insertions(+), 3 deletions(-)
 create mode 100644 security/selinux/ibpkey.c
 create mode 100644 security/selinux/include/ibpkey.h

diff --git a/security/selinux/Makefile b/security/selinux/Makefile
index 3411c33..ff5895e 100644
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -5,7 +5,7 @@
 obj-$(CONFIG_SECURITY_SELINUX) := selinux.o
 
 selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o netif.o \
-	     netnode.o netport.o exports.o \
+	     netnode.o netport.o ibpkey.o exports.o \
 	     ss/ebitmap.o ss/hashtab.o ss/symtab.o ss/sidtab.o ss/avtab.o \
 	     ss/policydb.o ss/services.o ss/conditional.o ss/mls.o ss/status.o
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0954dfe..2fe4320 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -90,6 +90,7 @@
 #include "netif.h"
 #include "netnode.h"
 #include "netport.h"
+#include "ibpkey.h"
 #include "xfrm.h"
 #include "netlabel.h"
 #include "audit.h"
@@ -173,8 +174,10 @@ static int selinux_netcache_avc_callback(u32 event)
 
 static int selinux_lsm_notifier_avc_callback(u32 event)
 {
-	if (event == AVC_CALLBACK_RESET)
+	if (event == AVC_CALLBACK_RESET) {
+		sel_pkey_flush();
 		call_lsm_notifier(LSM_POLICY_CHANGE, NULL);
+	}
 
 	return 0;
 }
@@ -6094,7 +6097,7 @@ static int selinux_ib_pkey_access(void *ib_sec, u64 subnet_prefix, u16 pkey_val)
 	struct ib_security_struct *sec = ib_sec;
 	struct lsm_pkey_audit pkey;
 
-	err = security_pkey_sid(subnet_prefix, pkey_val, &sid);
+	err = sel_pkey_sid(subnet_prefix, pkey_val, &sid);
 
 	if (err)
 		return err;
diff --git a/security/selinux/ibpkey.c b/security/selinux/ibpkey.c
new file mode 100644
index 0000000..6e52c54
--- /dev/null
+++ b/security/selinux/ibpkey.c
@@ -0,0 +1,245 @@
+/*
+ * Pkey table
+ *
+ * SELinux must keep a mapping of Infinband PKEYs to labels/SIDs.  This
+ * mapping is maintained as part of the normal policy but a fast cache is
+ * needed to reduce the lookup overhead.
+ *
+ * This code is heavily based on the "netif" and "netport" concept originally
+ * developed by
+ * James Morris <jmorris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> and
+ * Paul Moore <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org>
+ *   (see security/selinux/netif.c and security/selinux/netport.c for more
+ *   information)
+ *
+ */
+
+/*
+ * (c) Mellanox Technologies, 2016
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/rcupdate.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+
+#include "ibpkey.h"
+#include "objsec.h"
+
+#define SEL_PKEY_HASH_SIZE       256
+#define SEL_PKEY_HASH_BKT_LIMIT   16
+
+struct sel_pkey_bkt {
+	int size;
+	struct list_head list;
+};
+
+struct sel_pkey {
+	struct pkey_security_struct psec;
+	struct list_head list;
+	struct rcu_head rcu;
+};
+
+static LIST_HEAD(sel_pkey_list);
+static DEFINE_SPINLOCK(sel_pkey_lock);
+static struct sel_pkey_bkt sel_pkey_hash[SEL_PKEY_HASH_SIZE];
+
+/**
+ * sel_pkey_hashfn - Hashing function for the pkey table
+ * @pkey: pkey number
+ *
+ * Description:
+ * This is the hashing function for the pkey table, it returns the bucket
+ * number for the given pkey.
+ *
+ */
+static unsigned int sel_pkey_hashfn(u16 pkey)
+{
+	return (pkey & (SEL_PKEY_HASH_SIZE - 1));
+}
+
+/**
+ * sel_pkey_find - Search for a pkey record
+ * @subnet_prefix: subnet_prefix
+ * @pkey_num: pkey_num
+ *
+ * Description:
+ * Search the pkey table and return the matching record.  If an entry
+ * can not be found in the table return NULL.
+ *
+ */
+static struct sel_pkey *sel_pkey_find(u64 subnet_prefix, u16 pkey_num)
+{
+	unsigned int idx;
+	struct sel_pkey *pkey;
+
+	idx = sel_pkey_hashfn(pkey_num);
+	list_for_each_entry_rcu(pkey, &sel_pkey_hash[idx].list, list) {
+		if (pkey->psec.pkey == pkey_num &&
+		    pkey->psec.subnet_prefix == subnet_prefix)
+			return pkey;
+	}
+
+	return NULL;
+}
+
+/**
+ * sel_pkey_insert - Insert a new pkey into the table
+ * @pkey: the new pkey record
+ *
+ * Description:
+ * Add a new pkey record to the hash table.
+ *
+ */
+static void sel_pkey_insert(struct sel_pkey *pkey)
+{
+	unsigned int idx;
+
+	/* we need to impose a limit on the growth of the hash table so check
+	 * this bucket to make sure it is within the specified bounds
+	 */
+	idx = sel_pkey_hashfn(pkey->psec.pkey);
+	list_add_rcu(&pkey->list, &sel_pkey_hash[idx].list);
+	if (sel_pkey_hash[idx].size == SEL_PKEY_HASH_BKT_LIMIT) {
+		struct sel_pkey *tail;
+
+		tail = list_entry(
+			rcu_dereference_protected(
+				sel_pkey_hash[idx].list.prev,
+				lockdep_is_held(&sel_pkey_lock)),
+			struct sel_pkey, list);
+		list_del_rcu(&tail->list);
+		kfree_rcu(tail, rcu);
+	} else {
+		sel_pkey_hash[idx].size++;
+	}
+}
+
+/**
+ * sel_pkey_sid_slow - Lookup the SID of a pkey using the policy
+ * @subnet_prefix: subnet prefix
+ * @pkey_num: pkey number
+ * @sid: pkey SID
+ *
+ * Description:
+ * This function determines the SID of a pkey by querying the security
+ * policy.  The result is added to the pkey table to speedup future
+ * queries.  Returns zero on success, negative values on failure.
+ *
+ */
+static int sel_pkey_sid_slow(u64 subnet_prefix, u16 pkey_num, u32 *sid)
+{
+	int ret = -ENOMEM;
+	struct sel_pkey *pkey;
+	struct sel_pkey *new = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sel_pkey_lock, flags);
+	pkey = sel_pkey_find(subnet_prefix, pkey_num);
+	if (pkey) {
+		*sid = pkey->psec.sid;
+		spin_unlock_irqrestore(&sel_pkey_lock, flags);
+		return 0;
+	}
+
+	ret = security_pkey_sid(subnet_prefix, pkey_num, sid);
+	if (ret != 0)
+		goto out;
+
+	new = kzalloc(sizeof(*new), GFP_ATOMIC);
+	if (!new)
+		goto out;
+
+	new->psec.subnet_prefix = subnet_prefix;
+	new->psec.pkey = pkey_num;
+	new->psec.sid = *sid;
+	sel_pkey_insert(new);
+
+out:
+	spin_unlock_irqrestore(&sel_pkey_lock, flags);
+	if (unlikely(ret))
+		kfree(new);
+
+	return ret;
+}
+
+/**
+ * sel_pkey_sid - Lookup the SID of a PKEY
+ * @subnet_prefix: subnet_prefix
+ * @pkey_num: pkey number
+ * @sid: pkey SID
+ *
+ * Description:
+ * This function determines the SID of a PKEY using the fastest method
+ * possible.  First the pkey table is queried, but if an entry can't be found
+ * then the policy is queried and the result is added to the table to speedup
+ * future queries.  Returns zero on success, negative values on failure.
+ *
+ */
+int sel_pkey_sid(u64 subnet_prefix, u16 pkey_num, u32 *sid)
+{
+	struct sel_pkey *pkey;
+
+	rcu_read_lock();
+	pkey = sel_pkey_find(subnet_prefix, pkey_num);
+	if (pkey) {
+		*sid = pkey->psec.sid;
+		rcu_read_unlock();
+		return 0;
+	}
+	rcu_read_unlock();
+
+	return sel_pkey_sid_slow(subnet_prefix, pkey_num, sid);
+}
+
+/**
+ * sel_pkey_flush - Flush the entire pkey table
+ *
+ * Description:
+ * Remove all entries from the pkey table
+ *
+ */
+void sel_pkey_flush(void)
+{
+	unsigned int idx;
+	struct sel_pkey *pkey, *pkey_tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sel_pkey_lock, flags);
+	for (idx = 0; idx < SEL_PKEY_HASH_SIZE; idx++) {
+		list_for_each_entry_safe(pkey, pkey_tmp,
+					 &sel_pkey_hash[idx].list, list) {
+			list_del_rcu(&pkey->list);
+			kfree_rcu(pkey, rcu);
+		}
+		sel_pkey_hash[idx].size = 0;
+	}
+	spin_unlock_irqrestore(&sel_pkey_lock, flags);
+}
+
+static __init int sel_pkey_init(void)
+{
+	int iter;
+
+	if (!selinux_enabled)
+		return 0;
+
+	for (iter = 0; iter < SEL_PKEY_HASH_SIZE; iter++) {
+		INIT_LIST_HEAD(&sel_pkey_hash[iter].list);
+		sel_pkey_hash[iter].size = 0;
+	}
+
+	return 0;
+}
+
+subsys_initcall(sel_pkey_init);
diff --git a/security/selinux/include/ibpkey.h b/security/selinux/include/ibpkey.h
new file mode 100644
index 0000000..387885a
--- /dev/null
+++ b/security/selinux/include/ibpkey.h
@@ -0,0 +1,31 @@
+/*
+ * pkey table
+ *
+ * SELinux must keep a mapping of pkeys to labels/SIDs.  This
+ * mapping is maintained as part of the normal policy but a fast cache is
+ * needed to reduce the lookup overhead.
+ *
+ */
+
+/*
+ * (c) Mellanox Technologies, 2016
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _SELINUX_IB_PKEY_H
+#define _SELINUX_IB_PKEY_H
+
+void sel_pkey_flush(void);
+
+int sel_pkey_sid(u64 subnet_prefix, u16 pkey, u32 *sid);
+
+#endif
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 8e7db43..4139f28 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -133,6 +133,12 @@ struct ib_security_struct {
 	u32 sid;        /* SID of the queue pair or MAD agent */
 };
 
+struct pkey_security_struct {
+	u64	subnet_prefix; /* Port subnet prefix */
+	u16	pkey;	/* PKey number */
+	u32	sid;	/* SID of pkey */
+};
+
 extern unsigned int selinux_checkreqprot;
 
 #endif /* _SELINUX_OBJSEC_H_ */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 8/9] selinux: Add IB Port SMP access vector
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Add a type for Infiniband ports and an access vector for subnet
management packets. Implement the ib_port_smp hook to check that the
caller has permission to send and receive SMPs on the end port specified
by the device name and port. Add interface to query the SID for a IB
port, which walks the IB_PORT ocontexts to find an entry for the
given name and port.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

---
v2:
- Shorted ib_end_port. Paul Moore
- Pass void blobs to security hooks. Paul Moore
- Log specific IB port info in audit log. Paul Moore
- Don't create a new intial sid, use unlabeled. Stephen Smalley
- Changed "smp" to "manage_subnet". Paul Moore

v3:
- ib_port -> ib_endport. Paul Moore
- Don't log device name as untrusted string. Paul Moore
- Reorder parameters of LSM hook. Paul Moore
---
 include/linux/lsm_audit.h           |  8 +++++++
 security/lsm_audit.c                |  5 +++++
 security/selinux/hooks.c            | 25 ++++++++++++++++++++++
 security/selinux/include/classmap.h |  2 ++
 security/selinux/include/security.h |  2 ++
 security/selinux/ss/services.c      | 42 +++++++++++++++++++++++++++++++++++++
 6 files changed, 84 insertions(+)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 402b770..7047b4c 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -21,6 +21,7 @@
 #include <linux/path.h>
 #include <linux/key.h>
 #include <linux/skbuff.h>
+#include <rdma/ib_verbs.h>
 
 struct lsm_network_audit {
 	int netif;
@@ -50,6 +51,11 @@ struct lsm_pkey_audit {
 	u16	pkey;
 };
 
+struct lsm_ib_endport_audit {
+	char	dev_name[IB_DEVICE_NAME_MAX];
+	u8	port_num;
+};
+
 /* Auxiliary data to use in generating the audit record. */
 struct common_audit_data {
 	char type;
@@ -66,6 +72,7 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_IOCTL_OP	11
 #define LSM_AUDIT_DATA_FILE	12
 #define LSM_AUDIT_DATA_PKEY	13
+#define LSM_AUDIT_DATA_IB_ENDPORT 14
 	union 	{
 		struct path path;
 		struct dentry *dentry;
@@ -84,6 +91,7 @@ struct common_audit_data {
 		struct lsm_ioctlop_audit *op;
 		struct file *file;
 		struct lsm_pkey_audit *pkey;
+		struct lsm_ib_endport_audit *ib_endport;
 	} u;
 	/* this union contains LSM specific data */
 	union {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index b18d277..549fe9d 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -423,6 +423,11 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 				 a->u.pkey->pkey, &sbn_pfx);
 		break;
 	}
+	case LSM_AUDIT_DATA_IB_ENDPORT:
+		audit_log_format(ab, " device=%s port_num=%u",
+				 a->u.ib_endport->dev_name,
+				 a->u.ib_endport->port_num);
+		break;
 	} /* switch (a->type) */
 }
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e21f7690..0954dfe 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6108,6 +6108,29 @@ static int selinux_ib_pkey_access(void *ib_sec, u64 subnet_prefix, u16 pkey_val)
 			    INFINIBAND_PKEY__ACCESS, &ad);
 }
 
+static int selinux_ib_endport_manage_subnet(void *ib_sec, const char *dev_name,
+					    u8 port_num)
+{
+	struct common_audit_data ad;
+	int err;
+	u32 sid = 0;
+	struct ib_security_struct *sec = ib_sec;
+	struct lsm_ib_endport_audit ib_endport;
+
+	err = security_ib_endport_sid(dev_name, port_num, &sid);
+
+	if (err)
+		return err;
+
+	ad.type = LSM_AUDIT_DATA_IB_ENDPORT;
+	strncpy(ib_endport.dev_name, dev_name, sizeof(ib_endport.dev_name));
+	ib_endport.port_num = port_num;
+	ad.u.ib_endport = &ib_endport;
+	return avc_has_perm(sec->sid, sid,
+			    SECCLASS_INFINIBAND_ENDPORT,
+			    INFINIBAND_ENDPORT__MANAGE_SUBNET, &ad);
+}
+
 static int selinux_ib_alloc_security(void **ib_sec)
 {
 	struct ib_security_struct *sec;
@@ -6313,6 +6336,8 @@ static struct security_hook_list selinux_hooks[] = {
 	LSM_HOOK_INIT(tun_dev_open, selinux_tun_dev_open),
 #ifdef CONFIG_SECURITY_INFINIBAND
 	LSM_HOOK_INIT(ib_pkey_access, selinux_ib_pkey_access),
+	LSM_HOOK_INIT(ib_endport_manage_subnet,
+		      selinux_ib_endport_manage_subnet),
 	LSM_HOOK_INIT(ib_alloc_security, selinux_ib_alloc_security),
 	LSM_HOOK_INIT(ib_free_security, selinux_ib_free_security),
 #endif
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index d42dd4d..f93b64b 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -167,5 +167,7 @@ struct security_class_mapping secclass_map[] = {
 	  { COMMON_CAP2_PERMS, NULL } },
 	{ "infiniband_pkey",
 	  { "access", NULL } },
+	{ "infiniband_endport",
+	  { "manage_subnet", NULL } },
 	{ NULL }
   };
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 17afb7c..8a6e5e7 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -178,6 +178,8 @@ int security_port_sid(u8 protocol, u16 port, u32 *out_sid);
 
 int security_pkey_sid(u64 subnet_prefix, u16 pkey_num, u32 *out_sid);
 
+int security_ib_endport_sid(const char *dev_name, u8 port_num, u32 *out_sid);
+
 int security_netif_sid(char *name, u32 *if_sid);
 
 int security_node_sid(u16 domain, void *addr, u32 addrlen,
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 085c54b..db40ce7 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2244,6 +2244,48 @@ int security_pkey_sid(u64 subnet_prefix, u16 pkey_num, u32 *out_sid)
 }
 
 /**
+ * security_ib_endport_sid - Obtain the SID for a subnet management interface.
+ * @dev_name: device name
+ * @port: port number
+ * @out_sid: security identifier
+ */
+int security_ib_endport_sid(const char *dev_name, u8 port_num, u32 *out_sid)
+{
+	struct ocontext *c;
+	int rc = 0;
+
+	read_lock(&policy_rwlock);
+
+	c = policydb.ocontexts[OCON_IB_ENDPORT];
+	while (c) {
+		if (c->u.ib_endport.port_num == port_num &&
+		    !strncmp(c->u.ib_endport.dev_name,
+			     dev_name,
+			     IB_DEVICE_NAME_MAX))
+			break;
+
+		c = c->next;
+	}
+
+	if (c) {
+		if (!c->sid[0]) {
+			rc = sidtab_context_to_sid(&sidtab,
+						   &c->context[0],
+						   &c->sid[0]);
+			if (rc)
+				goto out;
+		}
+		*out_sid = c->sid[0];
+	} else {
+		*out_sid = SECINITSID_UNLABELED;
+	}
+
+out:
+	read_unlock(&policy_rwlock);
+	return rc;
+}
+
+/**
  * security_netif_sid - Obtain the SID for a network interface.
  * @name: interface name
  * @if_sid: interface SID
-- 
2.7.4


^ permalink raw reply related

* [PATCH v5 7/9] selinux: Implement Infiniband PKey "Access" access vector
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Add a type and access vector for PKeys. Implement the ib_pkey_access
hook to check that the caller has permission to access the PKey on the
given subnet prefix. Add an interface to get the PKey SID. Walk the PKey
ocontexts to find an entry for the given subnet prefix and pkey.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

---
v2:
- Use void* blobs for security structs. Paul Moore
- Add pkey specific data to the audit log. Paul Moore
- Don't introduce a new initial sid, use unlabeled. Stephen Smalley

v3:
- Reorder parameters to pkey_access hook. Paul Moore
---
 include/linux/lsm_audit.h           |  7 +++++++
 security/lsm_audit.c                | 13 ++++++++++++
 security/selinux/hooks.c            | 23 +++++++++++++++++++++
 security/selinux/include/classmap.h |  2 ++
 security/selinux/include/security.h |  2 ++
 security/selinux/ss/services.c      | 41 +++++++++++++++++++++++++++++++++++++
 6 files changed, 88 insertions(+)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index e58e577..402b770 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -45,6 +45,11 @@ struct lsm_ioctlop_audit {
 	u16 cmd;
 };
 
+struct lsm_pkey_audit {
+	u64	subnet_prefix;
+	u16	pkey;
+};
+
 /* Auxiliary data to use in generating the audit record. */
 struct common_audit_data {
 	char type;
@@ -60,6 +65,7 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_DENTRY	10
 #define LSM_AUDIT_DATA_IOCTL_OP	11
 #define LSM_AUDIT_DATA_FILE	12
+#define LSM_AUDIT_DATA_PKEY	13
 	union 	{
 		struct path path;
 		struct dentry *dentry;
@@ -77,6 +83,7 @@ struct common_audit_data {
 		char *kmod_name;
 		struct lsm_ioctlop_audit *op;
 		struct file *file;
+		struct lsm_pkey_audit *pkey;
 	} u;
 	/* this union contains LSM specific data */
 	union {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 37f04da..b18d277 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -410,6 +410,19 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 		audit_log_format(ab, " kmod=");
 		audit_log_untrustedstring(ab, a->u.kmod_name);
 		break;
+	case LSM_AUDIT_DATA_PKEY: {
+		struct in6_addr sbn_pfx;
+
+		memset(&sbn_pfx.s6_addr, 0,
+		       sizeof(sbn_pfx.s6_addr));
+
+		memcpy(&sbn_pfx.s6_addr, &a->u.pkey->subnet_prefix,
+		       sizeof(a->u.pkey->subnet_prefix));
+
+		audit_log_format(ab, " pkey=0x%x subnet_prefix=%pI6c",
+				 a->u.pkey->pkey, &sbn_pfx);
+		break;
+	}
 	} /* switch (a->type) */
 }
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d87e29d..e21f7690 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6086,6 +6086,28 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer)
 #endif
 
 #ifdef CONFIG_SECURITY_INFINIBAND
+static int selinux_ib_pkey_access(void *ib_sec, u64 subnet_prefix, u16 pkey_val)
+{
+	struct common_audit_data ad;
+	int err;
+	u32 sid = 0;
+	struct ib_security_struct *sec = ib_sec;
+	struct lsm_pkey_audit pkey;
+
+	err = security_pkey_sid(subnet_prefix, pkey_val, &sid);
+
+	if (err)
+		return err;
+
+	ad.type = LSM_AUDIT_DATA_PKEY;
+	pkey.subnet_prefix = subnet_prefix;
+	pkey.pkey = pkey_val;
+	ad.u.pkey = &pkey;
+	return avc_has_perm(sec->sid, sid,
+			    SECCLASS_INFINIBAND_PKEY,
+			    INFINIBAND_PKEY__ACCESS, &ad);
+}
+
 static int selinux_ib_alloc_security(void **ib_sec)
 {
 	struct ib_security_struct *sec;
@@ -6290,6 +6312,7 @@ static struct security_hook_list selinux_hooks[] = {
 	LSM_HOOK_INIT(tun_dev_attach, selinux_tun_dev_attach),
 	LSM_HOOK_INIT(tun_dev_open, selinux_tun_dev_open),
 #ifdef CONFIG_SECURITY_INFINIBAND
+	LSM_HOOK_INIT(ib_pkey_access, selinux_ib_pkey_access),
 	LSM_HOOK_INIT(ib_alloc_security, selinux_ib_alloc_security),
 	LSM_HOOK_INIT(ib_free_security, selinux_ib_free_security),
 #endif
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 1f1f4b2..d42dd4d 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -165,5 +165,7 @@ struct security_class_mapping secclass_map[] = {
 	  { COMMON_CAP_PERMS, NULL } },
 	{ "cap2_userns",
 	  { COMMON_CAP2_PERMS, NULL } },
+	{ "infiniband_pkey",
+	  { "access", NULL } },
 	{ NULL }
   };
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 6bb9b0a..17afb7c 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -176,6 +176,8 @@ int security_get_user_sids(u32 callsid, char *username,
 
 int security_port_sid(u8 protocol, u16 port, u32 *out_sid);
 
+int security_pkey_sid(u64 subnet_prefix, u16 pkey_num, u32 *out_sid);
+
 int security_netif_sid(char *name, u32 *if_sid);
 
 int security_node_sid(u16 domain, void *addr, u32 addrlen,
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 082b20c..085c54b 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2203,6 +2203,47 @@ int security_port_sid(u8 protocol, u16 port, u32 *out_sid)
 }
 
 /**
+ * security_pkey_sid - Obtain the SID for a pkey.
+ * @subnet_prefix: Subnet Prefix
+ * @pkey_num: pkey number
+ * @out_sid: security identifier
+ */
+int security_pkey_sid(u64 subnet_prefix, u16 pkey_num, u32 *out_sid)
+{
+	struct ocontext *c;
+	int rc = 0;
+
+	read_lock(&policy_rwlock);
+
+	c = policydb.ocontexts[OCON_PKEY];
+	while (c) {
+		if (c->u.pkey.low_pkey <= pkey_num &&
+		    c->u.pkey.high_pkey >= pkey_num &&
+		    c->u.pkey.subnet_prefix == subnet_prefix)
+			break;
+
+		c = c->next;
+	}
+
+	if (c) {
+		if (!c->sid[0]) {
+			rc = sidtab_context_to_sid(&sidtab,
+						   &c->context[0],
+						   &c->sid[0]);
+			if (rc)
+				goto out;
+		}
+		*out_sid = c->sid[0];
+	} else {
+		*out_sid = SECINITSID_UNLABELED;
+	}
+
+out:
+	read_unlock(&policy_rwlock);
+	return rc;
+}
+
+/**
  * security_netif_sid - Obtain the SID for a network interface.
  * @name: interface name
  * @if_sid: interface SID
-- 
2.7.4


^ permalink raw reply related

* [PATCH v5 6/9] selinux: Allocate and free infiniband security hooks
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Implement and attach hooks to allocate and free Infiniband object
security structures.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

---
v2:
- Use void * blobs for security structs.  Paul Moore
- Shorten ib_end_port to ib_port.  Paul Moore
- Allocate memory for security struct with GFP_KERNEL. Yuval Shaia
---
 security/selinux/hooks.c          | 25 ++++++++++++++++++++++++-
 security/selinux/include/objsec.h |  5 +++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 2d7a7c1..d87e29d 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -17,6 +17,7 @@
  *	Paul Moore <paul@paul-moore.com>
  *  Copyright (C) 2007 Hitachi Software Engineering Co., Ltd.
  *		       Yuichi Nakamura <ynakam@hitachisoft.jp>
+ *  Copyright (C) 2016 Mellanox Technologies
  *
  *	This program is free software; you can redistribute it and/or modify
  *	it under the terms of the GNU General Public License version 2,
@@ -6082,7 +6083,26 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer)
 	*_buffer = context;
 	return rc;
 }
+#endif
+
+#ifdef CONFIG_SECURITY_INFINIBAND
+static int selinux_ib_alloc_security(void **ib_sec)
+{
+	struct ib_security_struct *sec;
+
+	sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+	if (!sec)
+		return -ENOMEM;
+	sec->sid = current_sid();
+
+	*ib_sec = sec;
+	return 0;
+}
 
+static void selinux_ib_free_security(void *ib_sec)
+{
+	kfree(ib_sec);
+}
 #endif
 
 static struct security_hook_list selinux_hooks[] = {
@@ -6269,7 +6289,10 @@ static struct security_hook_list selinux_hooks[] = {
 	LSM_HOOK_INIT(tun_dev_attach_queue, selinux_tun_dev_attach_queue),
 	LSM_HOOK_INIT(tun_dev_attach, selinux_tun_dev_attach),
 	LSM_HOOK_INIT(tun_dev_open, selinux_tun_dev_open),
-
+#ifdef CONFIG_SECURITY_INFINIBAND
+	LSM_HOOK_INIT(ib_alloc_security, selinux_ib_alloc_security),
+	LSM_HOOK_INIT(ib_free_security, selinux_ib_free_security),
+#endif
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 	LSM_HOOK_INIT(xfrm_policy_alloc_security, selinux_xfrm_policy_alloc),
 	LSM_HOOK_INIT(xfrm_policy_clone_security, selinux_xfrm_policy_clone),
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index c21e135..8e7db43 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -10,6 +10,7 @@
  *
  *  Copyright (C) 2001,2002 Networks Associates Technology, Inc.
  *  Copyright (C) 2003 Red Hat, Inc., James Morris <jmorris@redhat.com>
+ *  Copyright (C) 2016 Mellanox Technologies
  *
  *	This program is free software; you can redistribute it and/or modify
  *	it under the terms of the GNU General Public License version 2,
@@ -128,6 +129,10 @@ struct key_security_struct {
 	u32 sid;	/* SID of key */
 };
 
+struct ib_security_struct {
+	u32 sid;        /* SID of the queue pair or MAD agent */
+};
+
 extern unsigned int selinux_checkreqprot;
 
 #endif /* _SELINUX_OBJSEC_H_ */
-- 
2.7.4


^ permalink raw reply related

* [PATCH v5 5/9] selinux: Create policydb version for Infiniband support
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Support for Infiniband requires the addition of two new object contexts,
one for infiniband PKeys and another IB Ports. Added handlers to read
and write the new ocontext types when reading or writing a binary policy
representation.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>

---
v2:
- Shorten ib_end_port to ib_port. Paul Moore
- Added bounds checking to port number. Paul Moore
- Eliminated {} in OCON_PKEY case statement.  Yuval Shaia

v3:
- ib_port -> ib_endport. Paul Moore

v4:
- removed unneeded brackets in ocontext_read. Paul Moore
---
 security/selinux/include/security.h |   3 +-
 security/selinux/ss/policydb.c      | 129 +++++++++++++++++++++++++++++++-----
 security/selinux/ss/policydb.h      |  27 +++++---
 3 files changed, 135 insertions(+), 24 deletions(-)

diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 308a286..6bb9b0a 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -36,10 +36,11 @@
 #define POLICYDB_VERSION_DEFAULT_TYPE	28
 #define POLICYDB_VERSION_CONSTRAINT_NAMES	29
 #define POLICYDB_VERSION_XPERMS_IOCTL	30
+#define POLICYDB_VERSION_INFINIBAND		31
 
 /* Range of policy versions we understand*/
 #define POLICYDB_VERSION_MIN   POLICYDB_VERSION_BASE
-#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_XPERMS_IOCTL
+#define POLICYDB_VERSION_MAX   POLICYDB_VERSION_INFINIBAND
 
 /* Mask for just the mount related flags */
 #define SE_MNTMASK	0x0f
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index d719db4..24e16da 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -17,6 +17,11 @@
  *
  *      Added support for the policy capability bitmap
  *
+ * Update: Mellanox Techonologies
+ *
+ *	Added Infiniband support
+ *
+ * Copyright (C) 2016 Mellanox Techonologies
  * Copyright (C) 2007 Hewlett-Packard Development Company, L.P.
  * Copyright (C) 2004-2005 Trusted Computer Solutions, Inc.
  * Copyright (C) 2003 - 2004 Tresys Technology, LLC
@@ -76,81 +81,86 @@ static struct policydb_compat_info policydb_compat[] = {
 	{
 		.version	= POLICYDB_VERSION_BASE,
 		.sym_num	= SYM_NUM - 3,
-		.ocon_num	= OCON_NUM - 1,
+		.ocon_num	= OCON_NUM - 3,
 	},
 	{
 		.version	= POLICYDB_VERSION_BOOL,
 		.sym_num	= SYM_NUM - 2,
-		.ocon_num	= OCON_NUM - 1,
+		.ocon_num	= OCON_NUM - 3,
 	},
 	{
 		.version	= POLICYDB_VERSION_IPV6,
 		.sym_num	= SYM_NUM - 2,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_NLCLASS,
 		.sym_num	= SYM_NUM - 2,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_MLS,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_AVTAB,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_RANGETRANS,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_POLCAP,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_PERMISSIVE,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_BOUNDARY,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_FILENAME_TRANS,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_ROLETRANS,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_NEW_OBJECT_DEFAULTS,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_DEFAULT_TYPE,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_CONSTRAINT_NAMES,
 		.sym_num	= SYM_NUM,
-		.ocon_num	= OCON_NUM,
+		.ocon_num	= OCON_NUM - 2,
 	},
 	{
 		.version	= POLICYDB_VERSION_XPERMS_IOCTL,
 		.sym_num	= SYM_NUM,
+		.ocon_num	= OCON_NUM - 2,
+	},
+	{
+		.version	= POLICYDB_VERSION_INFINIBAND,
+		.sym_num	= SYM_NUM,
 		.ocon_num	= OCON_NUM,
 	},
 };
@@ -2222,6 +2232,60 @@ static int ocontext_read(struct policydb *p, struct policydb_compat_info *info,
 					goto out;
 				break;
 			}
+			case OCON_PKEY:
+				rc = next_entry(nodebuf, fp, sizeof(u32) * 6);
+				if (rc)
+					goto out;
+
+				c->u.pkey.subnet_prefix = be64_to_cpu(*((__be64 *)nodebuf));
+				/* The subnet prefix is stored as an IPv6
+				 * address in the policy.
+				 *
+				 * Check that the lower 2 DWORDS are 0.
+				 */
+				if (nodebuf[2] || nodebuf[3]) {
+					rc = -EINVAL;
+					goto out;
+				}
+
+				if (nodebuf[4] > 0xffff ||
+				    nodebuf[5] > 0xffff) {
+					rc = -EINVAL;
+					goto out;
+				}
+
+				c->u.pkey.low_pkey = le32_to_cpu(nodebuf[4]);
+				c->u.pkey.high_pkey = le32_to_cpu(nodebuf[5]);
+
+				rc = context_read_and_validate(&c->context[0],
+							       p,
+							       fp);
+				if (rc)
+					goto out;
+				break;
+			case OCON_IB_ENDPORT:
+				rc = next_entry(buf, fp, sizeof(u32) * 2);
+				if (rc)
+					goto out;
+				len = le32_to_cpu(buf[0]);
+
+				rc = str_read(&c->u.ib_endport.dev_name, GFP_KERNEL, fp, len);
+				if (rc)
+					goto out;
+
+				if (buf[1] > 0xff || buf[1] == 0) {
+					rc = -EINVAL;
+					goto out;
+				}
+
+				c->u.ib_endport.port_num = le32_to_cpu(buf[1]);
+
+				rc = context_read_and_validate(&c->context[0],
+							       p,
+							       fp);
+				if (rc)
+					goto out;
+				break;
 			}
 		}
 	}
@@ -3151,6 +3215,41 @@ static int ocontext_write(struct policydb *p, struct policydb_compat_info *info,
 				if (rc)
 					return rc;
 				break;
+			case OCON_PKEY:
+				*((__be64 *)nodebuf) = cpu_to_be64(c->u.pkey.subnet_prefix);
+
+				/*
+				 * The low order 2 bits were confirmed to be 0
+				 * when the policy was loaded. Write them out
+				 * as zero
+				 */
+				nodebuf[2] = 0;
+				nodebuf[3] = 0;
+
+				nodebuf[4] = cpu_to_le32(c->u.pkey.low_pkey);
+				nodebuf[5] = cpu_to_le32(c->u.pkey.high_pkey);
+
+				rc = put_entry(nodebuf, sizeof(u32), 6, fp);
+				if (rc)
+					return rc;
+				rc = context_write(p, &c->context[0], fp);
+				if (rc)
+					return rc;
+				break;
+			case OCON_IB_ENDPORT:
+				len = strlen(c->u.ib_endport.dev_name);
+				buf[0] = cpu_to_le32(len);
+				buf[1] = cpu_to_le32(c->u.ib_endport.port_num);
+				rc = put_entry(buf, sizeof(u32), 2, fp);
+				if (rc)
+					return rc;
+				rc = put_entry(c->u.ib_endport.dev_name, 1, len, fp);
+				if (rc)
+					return rc;
+				rc = context_write(p, &c->context[0], fp);
+				if (rc)
+					return rc;
+				break;
 			}
 		}
 	}
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index 725d594..edb329d 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -187,6 +187,15 @@ struct ocontext {
 			u32 addr[4];
 			u32 mask[4];
 		} node6;        /* IPv6 node information */
+		struct {
+			u64 subnet_prefix;
+			u16 low_pkey;
+			u16 high_pkey;
+		} pkey;
+		struct {
+			char *dev_name;
+			u8 port_num;
+		} ib_endport;
 	} u;
 	union {
 		u32 sclass;  /* security class for genfs */
@@ -215,14 +224,16 @@ struct genfs {
 #define SYM_NUM     8
 
 /* object context array indices */
-#define OCON_ISID  0	/* initial SIDs */
-#define OCON_FS    1	/* unlabeled file systems */
-#define OCON_PORT  2	/* TCP and UDP port numbers */
-#define OCON_NETIF 3	/* network interfaces */
-#define OCON_NODE  4	/* nodes */
-#define OCON_FSUSE 5	/* fs_use */
-#define OCON_NODE6 6	/* IPv6 nodes */
-#define OCON_NUM   7
+#define OCON_ISID	0 /* initial SIDs */
+#define OCON_FS		1 /* unlabeled file systems */
+#define OCON_PORT	2 /* TCP and UDP port numbers */
+#define OCON_NETIF	3 /* network interfaces */
+#define OCON_NODE	4 /* nodes */
+#define OCON_FSUSE	5 /* fs_use */
+#define OCON_NODE6	6 /* IPv6 nodes */
+#define OCON_PKEY	7 /* Infiniband PKeys */
+#define OCON_IB_ENDPORT	8 /* Infiniband end ports */
+#define OCON_NUM	9
 
 /* The policy database */
 struct policydb {
-- 
2.7.4


^ permalink raw reply related

* [PATCH v5 4/9] IB/core: Enforce security on management datagrams
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
	sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
  Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Allocate and free a security context when creating and destroying a MAD
agent.  This context is used for controlling access to PKeys and sending
and receiving SMPs.

When sending or receiving a MAD check that the agent has permission to
access the PKey for the Subnet Prefix of the port.

During MAD and snoop agent registration for SMI QPs check that the
calling process has permission to access the manage the subnet  and
register a callback with the LSM to be notified of policy changes. When
notificaiton of a policy change occurs recheck permission and set a flag
indicating sending and receiving SMPs is allowed.

When sending and receiving MADs check that the agent has access to the
SMI if it's on an SMI QP.  Because security policy can change it's
possible permission was allowed when creating the agent, but no longer
is.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
v2:
- Squashed LSM hook additions. Paul Moore
- Changed security blobs to void*. Paul Moore
- Shorten end_port to port. Paul Moore
- Change "smp" to "manage_subnet". Paul Moore
- Use the LSM policy change notification and a flag to track permission
  instead of calling the LSM hook for every SMP. Dan Jurgens
- Squashed PKey and SMP enforcement into the same patch and moved the
  logic into security.c. Dan Jurgens

v3:
- ib_port -> ib_endport. Paul Moore
- Use notifier chains for LSM notification. Paul Moore
- Reorder LSM hook parameters to put sec first. Paul Moore
---
 drivers/infiniband/core/core_priv.h | 35 ++++++++++++++
 drivers/infiniband/core/mad.c       | 52 +++++++++++++++++----
 drivers/infiniband/core/security.c  | 92 +++++++++++++++++++++++++++++++++++++
 include/linux/lsm_hooks.h           |  8 ++++
 include/linux/security.h            |  6 +++
 include/rdma/ib_mad.h               |  4 ++
 security/security.c                 |  8 ++++
 7 files changed, 197 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 68e3de0..2c35162 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -37,6 +37,8 @@
 #include <linux/spinlock.h>
 
 #include <rdma/ib_verbs.h>
+#include <rdma/ib_mad.h>
+#include "mad_priv.h"
 
 struct pkey_index_qp_list {
 	struct list_head    pkey_index_list;
@@ -166,6 +168,11 @@ int ib_get_cached_subnet_prefix(struct ib_device *device,
 				u64              *sn_pfx);
 
 #ifdef CONFIG_SECURITY_INFINIBAND
+int ib_security_pkey_access(struct ib_device *dev,
+			    u8 port_num,
+			    u16 pkey_index,
+			    void *sec);
+
 void ib_security_destroy_port_pkey_list(struct ib_device *device);
 
 void ib_security_cache_change(struct ib_device *device,
@@ -183,7 +190,19 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec);
 void ib_destroy_qp_security_end(struct ib_qp_security *sec);
 int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev);
 void ib_close_shared_qp_security(struct ib_qp_security *sec);
+int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
+				enum ib_qp_type qp_type);
+void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent);
+int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index);
 #else
+static inline int ib_security_pkey_access(struct ib_device *dev,
+					  u8 port_num,
+					  u16 pkey_index,
+					  void *sec)
+{
+	return 0;
+}
+
 static inline void ib_security_destroy_port_pkey_list(struct ib_device *device)
 {
 }
@@ -232,5 +251,21 @@ static inline int ib_open_shared_qp_security(struct ib_qp *qp,
 static inline void ib_close_shared_qp_security(struct ib_qp_security *sec)
 {
 }
+
+static inline int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
+					      enum ib_qp_type qp_type)
+{
+	return 0;
+}
+
+static inline void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
+{
+}
+
+static inline int ib_mad_enforce_security(struct ib_mad_agent_private *map,
+					  u16 pkey_index)
+{
+	return 0;
+}
 #endif
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 40cbd6b..b6041ec 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -40,9 +40,11 @@
 #include <linux/dma-mapping.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/security.h>
 #include <rdma/ib_cache.h>
 
 #include "mad_priv.h"
+#include "core_priv.h"
 #include "mad_rmpp.h"
 #include "smi.h"
 #include "opa_smi.h"
@@ -367,6 +369,12 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 	atomic_set(&mad_agent_priv->refcount, 1);
 	init_completion(&mad_agent_priv->comp);
 
+	ret2 = ib_mad_agent_security_setup(&mad_agent_priv->agent, qp_type);
+	if (ret2) {
+		ret = ERR_PTR(ret2);
+		goto error4;
+	}
+
 	spin_lock_irqsave(&port_priv->reg_lock, flags);
 	mad_agent_priv->agent.hi_tid = ++ib_mad_client_id;
 
@@ -384,7 +392,7 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 				if (method) {
 					if (method_in_use(&method,
 							   mad_reg_req))
-						goto error4;
+						goto error5;
 				}
 			}
 			ret2 = add_nonoui_reg_req(mad_reg_req, mad_agent_priv,
@@ -400,14 +408,14 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 					if (is_vendor_method_in_use(
 							vendor_class,
 							mad_reg_req))
-						goto error4;
+						goto error5;
 				}
 			}
 			ret2 = add_oui_reg_req(mad_reg_req, mad_agent_priv);
 		}
 		if (ret2) {
 			ret = ERR_PTR(ret2);
-			goto error4;
+			goto error5;
 		}
 	}
 
@@ -416,9 +424,10 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
 	spin_unlock_irqrestore(&port_priv->reg_lock, flags);
 
 	return &mad_agent_priv->agent;
-
-error4:
+error5:
 	spin_unlock_irqrestore(&port_priv->reg_lock, flags);
+	ib_mad_agent_security_cleanup(&mad_agent_priv->agent);
+error4:
 	kfree(reg_req);
 error3:
 	kfree(mad_agent_priv);
@@ -489,6 +498,7 @@ struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
 	struct ib_mad_agent *ret;
 	struct ib_mad_snoop_private *mad_snoop_priv;
 	int qpn;
+	int err;
 
 	/* Validate parameters */
 	if ((is_snooping_sends(mad_snoop_flags) && !snoop_handler) ||
@@ -523,17 +533,25 @@ struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
 	mad_snoop_priv->agent.port_num = port_num;
 	mad_snoop_priv->mad_snoop_flags = mad_snoop_flags;
 	init_completion(&mad_snoop_priv->comp);
+
+	err = ib_mad_agent_security_setup(&mad_snoop_priv->agent, qp_type);
+	if (err) {
+		ret = ERR_PTR(err);
+		goto error2;
+	}
+
 	mad_snoop_priv->snoop_index = register_snoop_agent(
 						&port_priv->qp_info[qpn],
 						mad_snoop_priv);
 	if (mad_snoop_priv->snoop_index < 0) {
 		ret = ERR_PTR(mad_snoop_priv->snoop_index);
-		goto error2;
+		goto error3;
 	}
 
 	atomic_set(&mad_snoop_priv->refcount, 1);
 	return &mad_snoop_priv->agent;
-
+error3:
+	ib_mad_agent_security_cleanup(&mad_snoop_priv->agent);
 error2:
 	kfree(mad_snoop_priv);
 error1:
@@ -579,6 +597,8 @@ static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
 	deref_mad_agent(mad_agent_priv);
 	wait_for_completion(&mad_agent_priv->comp);
 
+	ib_mad_agent_security_cleanup(&mad_agent_priv->agent);
+
 	kfree(mad_agent_priv->reg_req);
 	kfree(mad_agent_priv);
 }
@@ -597,6 +617,8 @@ static void unregister_mad_snoop(struct ib_mad_snoop_private *mad_snoop_priv)
 	deref_snoop_agent(mad_snoop_priv);
 	wait_for_completion(&mad_snoop_priv->comp);
 
+	ib_mad_agent_security_cleanup(&mad_snoop_priv->agent);
+
 	kfree(mad_snoop_priv);
 }
 
@@ -1219,12 +1241,16 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
 
 	/* Walk list of send WRs and post each on send list */
 	for (; send_buf; send_buf = next_send_buf) {
-
 		mad_send_wr = container_of(send_buf,
 					   struct ib_mad_send_wr_private,
 					   send_buf);
 		mad_agent_priv = mad_send_wr->mad_agent_priv;
 
+		ret = ib_mad_enforce_security(mad_agent_priv,
+					      mad_send_wr->send_wr.pkey_index);
+		if (ret)
+			goto error;
+
 		if (!send_buf->mad_agent->send_handler ||
 		    (send_buf->timeout_ms &&
 		     !send_buf->mad_agent->recv_handler)) {
@@ -1958,6 +1984,14 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 	struct ib_mad_send_wr_private *mad_send_wr;
 	struct ib_mad_send_wc mad_send_wc;
 	unsigned long flags;
+	int ret;
+
+	ret = ib_mad_enforce_security(mad_agent_priv,
+				      mad_recv_wc->wc->pkey_index);
+	if (ret) {
+		ib_free_recv_mad(mad_recv_wc);
+		deref_mad_agent(mad_agent_priv);
+	}
 
 	INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
 	list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
@@ -2015,6 +2049,8 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
 						   mad_recv_wc);
 		deref_mad_agent(mad_agent_priv);
 	}
+
+	return;
 }
 
 static enum smi_action handle_ib_smi(const struct ib_mad_port_private *port_priv,
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 45800dd..bc09af5 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -39,6 +39,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
 #include "core_priv.h"
+#include "mad_priv.h"
 
 static struct pkey_index_qp_list *get_pkey_idx_qp_list(struct ib_port_pkey *pp)
 {
@@ -614,4 +615,95 @@ int ib_security_modify_qp(struct ib_qp *qp,
 }
 EXPORT_SYMBOL(ib_security_modify_qp);
 
+int ib_security_pkey_access(struct ib_device *dev,
+			    u8 port_num,
+			    u16 pkey_index,
+			    void *sec)
+{
+	u64 subnet_prefix;
+	u16 pkey;
+	int ret;
+
+	ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey);
+	if (ret)
+		return ret;
+
+	ret = ib_get_cached_subnet_prefix(dev, port_num, &subnet_prefix);
+
+	if (ret)
+		return ret;
+
+	return security_ib_pkey_access(sec, subnet_prefix, pkey);
+}
+EXPORT_SYMBOL(ib_security_pkey_access);
+
+static int ib_mad_agent_security_change(struct notifier_block *nb,
+					unsigned long event,
+					void *data)
+{
+	struct ib_mad_agent *ag = container_of(nb, struct ib_mad_agent, lsm_nb);
+
+	if (event != LSM_POLICY_CHANGE)
+		return NOTIFY_DONE;
+
+	ag->smp_allowed = !security_ib_endport_manage_subnet(ag->security,
+							     ag->device->name,
+							     ag->port_num);
+
+	return NOTIFY_OK;
+}
+
+int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
+				enum ib_qp_type qp_type)
+{
+	int ret;
+
+	ret = security_ib_alloc_security(&agent->security);
+	if (ret)
+		return ret;
+
+	if (qp_type != IB_QPT_SMI)
+		return 0;
+
+	ret = security_ib_endport_manage_subnet(agent->security,
+						agent->device->name,
+						agent->port_num);
+	if (ret)
+		return ret;
+
+	agent->lsm_nb.notifier_call = ib_mad_agent_security_change;
+	ret = register_lsm_notifier(&agent->lsm_nb);
+	if (ret)
+		return ret;
+
+	agent->smp_allowed = true;
+	agent->lsm_nb_reg = true;
+	return 0;
+}
+
+void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
+{
+	security_ib_free_security(agent->security);
+	if (agent->lsm_nb_reg)
+		unregister_lsm_notifier(&agent->lsm_nb);
+}
+
+int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index)
+{
+	int ret;
+
+	if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
+		return -EACCES;
+
+	ret = ib_security_pkey_access(map->agent.device,
+				      map->agent.port_num,
+				      pkey_index,
+				      map->agent.security);
+
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 #endif /* CONFIG_SECURITY_INFINIBAND */
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d576efc..3936216 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -911,6 +911,11 @@
  *	@subnet_prefix the subnet prefix of the port being used.
  *	@pkey the pkey to be accessed.
  *	@sec pointer to a security structure.
+ * @ib_endport_manage_subnet:
+ *	Check permissions to send and receive SMPs on a end port.
+ *	@dev_name the IB device name (i.e. mlx4_0).
+ *	@port_num the port number.
+ *	@sec pointer to a security structure.
  * @ib_alloc_security:
  *	Allocate a security structure for Infiniband objects.
  *	@sec pointer to a security structure pointer.
@@ -1629,6 +1634,8 @@ union security_list_options {
 
 #ifdef CONFIG_SECURITY_INFINIBAND
 	int (*ib_pkey_access)(void *sec, u64 subnet_prefix, u16 pkey);
+	int (*ib_endport_manage_subnet)(void *sec, const char *dev_name,
+					u8 port_num);
 	int (*ib_alloc_security)(void **sec);
 	void (*ib_free_security)(void *sec);
 #endif	/* CONFIG_SECURITY_INFINIBAND */
@@ -1865,6 +1872,7 @@ struct security_hook_heads {
 #endif	/* CONFIG_SECURITY_NETWORK */
 #ifdef CONFIG_SECURITY_INFINIBAND
 	struct list_head ib_pkey_access;
+	struct list_head ib_endport_manage_subnet;
 	struct list_head ib_alloc_security;
 	struct list_head ib_free_security;
 #endif	/* CONFIG_SECURITY_INFINIBAND */
diff --git a/include/linux/security.h b/include/linux/security.h
index 0a5de0c..fbd4bc7 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1419,6 +1419,7 @@ static inline int security_tun_dev_open(void *security)
 
 #ifdef CONFIG_SECURITY_INFINIBAND
 int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey);
+int security_ib_endport_manage_subnet(void *sec, const char *name, u8 port_num);
 int security_ib_alloc_security(void **sec);
 void security_ib_free_security(void *sec);
 #else	/* CONFIG_SECURITY_INFINIBAND */
@@ -1427,6 +1428,11 @@ static inline int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey
 	return 0;
 }
 
+static inline int security_ib_endport_manage_subnet(void *sec, const char *dev_name, u8 port_num)
+{
+	return 0;
+}
+
 static inline int security_ib_alloc_security(void **sec)
 {
 	return 0;
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index c8a773f..23c4a4e 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -537,6 +537,10 @@ struct ib_mad_agent {
 	u32			flags;
 	u8			port_num;
 	u8			rmpp_version;
+	void			*security;
+	bool			smp_allowed;
+	bool			lsm_nb_reg;
+	struct notifier_block   lsm_nb;
 };
 
 /**
diff --git a/security/security.c b/security/security.c
index 40326d4..e4fd2b6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1470,6 +1470,12 @@ int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey)
 }
 EXPORT_SYMBOL(security_ib_pkey_access);
 
+int security_ib_endport_manage_subnet(void *sec, const char *dev_name, u8 port_num)
+{
+	return call_int_hook(ib_endport_manage_subnet, 0, sec, dev_name, port_num);
+}
+EXPORT_SYMBOL(security_ib_endport_manage_subnet);
+
 int security_ib_alloc_security(void **sec)
 {
 	return call_int_hook(ib_alloc_security, 0, sec);
@@ -1943,6 +1949,8 @@ struct security_hook_heads security_hook_heads = {
 
 #ifdef CONFIG_SECURITY_INFINIBAND
 	.ib_pkey_access = LIST_HEAD_INIT(security_hook_heads.ib_pkey_access),
+	.ib_endport_manage_subnet =
+		LIST_HEAD_INIT(security_hook_heads.ib_endport_manage_subnet),
 	.ib_alloc_security =
 		LIST_HEAD_INIT(security_hook_heads.ib_alloc_security),
 	.ib_free_security =
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 3/9] selinux lsm IB/core: Implement LSM notification system
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
	sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
  Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add a generic notificaiton mechanism in the LSM. Interested consumers
can register a callback with the LSM and security modules can produce
events.

Because access to Infiniband QPs are enforced in the setup phase of a
connection security should be enforced again if the policy changes.
Register infiniband devices for policy change notification and check all
QPs on that device when the notification is received.

Add a call to the notification mechanism from SELinux when the AVC
cache changes or setenforce is cleared.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
v2:
- new patch that has the generic notification, replaces selinux and
  IB/core patches related to the ib_flush callback. Yuval Shaia and Paul
  Moore

v3:
- use notifier chains. Paul Moore

v4:
- Seperate avc callback for LSM notifier. Paul Moore

v5:
- Fix link error when CONFIG_SECURITY is not set. Build Robot
---
 drivers/infiniband/core/device.c | 53 ++++++++++++++++++++++++++++++++++++++++
 include/linux/security.h         | 23 +++++++++++++++++
 security/security.c              | 20 +++++++++++++++
 security/selinux/hooks.c         | 11 +++++++++
 security/selinux/selinuxfs.c     |  2 ++
 5 files changed, 109 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 5b42e83..7b6fd06 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,8 @@
 #include <linux/init.h>
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
+#include <linux/security.h>
+#include <linux/notifier.h>
 #include <rdma/rdma_netlink.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
@@ -82,6 +84,14 @@ static LIST_HEAD(client_list);
 static DEFINE_MUTEX(device_mutex);
 static DECLARE_RWSEM(lists_rwsem);
 
+static int ib_security_change(struct notifier_block *nb, unsigned long event,
+			      void *lsm_data);
+static void ib_policy_change_task(struct work_struct *work);
+static DECLARE_WORK(ib_policy_change_work, ib_policy_change_task);
+
+static struct notifier_block ibdev_lsm_nb = {
+	.notifier_call = ib_security_change,
+};
 
 static int ib_device_check_mandatory(struct ib_device *device)
 {
@@ -344,6 +354,40 @@ static int setup_port_pkey_list(struct ib_device *device)
 	return 0;
 }
 
+static void ib_policy_change_task(struct work_struct *work)
+{
+	struct ib_device *dev;
+
+	down_read(&lists_rwsem);
+	list_for_each_entry(dev, &device_list, core_list) {
+		int i;
+
+		for (i = rdma_start_port(dev); i <= rdma_end_port(dev); i++) {
+			u64 sp;
+			int ret = ib_get_cached_subnet_prefix(dev,
+							      i,
+							      &sp);
+
+			WARN_ONCE(ret,
+				  "ib_get_cached_subnet_prefix err: %d, this should never happen here\n",
+				  ret);
+			ib_security_cache_change(dev, i, sp);
+		}
+	}
+	up_read(&lists_rwsem);
+}
+
+static int ib_security_change(struct notifier_block *nb, unsigned long event,
+			      void *lsm_data)
+{
+	if (event != LSM_POLICY_CHANGE)
+		return NOTIFY_DONE;
+
+	schedule_work(&ib_policy_change_work);
+
+	return NOTIFY_OK;
+}
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -1075,10 +1119,18 @@ static int __init ib_core_init(void)
 		goto err_sa;
 	}
 
+	ret = register_lsm_notifier(&ibdev_lsm_nb);
+	if (ret) {
+		pr_warn("Couldn't register LSM notifier. ret %d\n", ret);
+		goto err_ibnl_clients;
+	}
+
 	ib_cache_setup();
 
 	return 0;
 
+err_ibnl_clients:
+	ib_remove_ibnl_clients();
 err_sa:
 	ib_sa_cleanup();
 err_mad:
@@ -1098,6 +1150,7 @@ static int __init ib_core_init(void)
 
 static void __exit ib_core_cleanup(void)
 {
+	unregister_lsm_notifier(&ibdev_lsm_nb);
 	ib_cache_cleanup();
 	ib_remove_ibnl_clients();
 	ib_sa_cleanup();
diff --git a/include/linux/security.h b/include/linux/security.h
index 342ca4c..0a5de0c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -69,6 +69,10 @@ struct audit_krule;
 struct user_namespace;
 struct timezone;
 
+enum lsm_event {
+	LSM_POLICY_CHANGE,
+};
+
 /* These functions are in security/commoncap.c */
 extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
 		       int cap, int audit);
@@ -161,6 +165,10 @@ struct security_mnt_opts {
 	int num_mnt_opts;
 };
 
+int call_lsm_notifier(enum lsm_event event, void *data);
+int register_lsm_notifier(struct notifier_block *nb);
+int unregister_lsm_notifier(struct notifier_block *nb);
+
 static inline void security_init_mnt_opts(struct security_mnt_opts *opts)
 {
 	opts->mnt_opts = NULL;
@@ -377,6 +385,21 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 struct security_mnt_opts {
 };
 
+static inline int call_lsm_notifier(enum lsm_event event, void *data)
+{
+	return 0;
+}
+
+static inline int register_lsm_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline  int unregister_lsm_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
 static inline void security_init_mnt_opts(struct security_mnt_opts *opts)
 {
 }
diff --git a/security/security.c b/security/security.c
index 7d3bf2f..40326d4 100644
--- a/security/security.c
+++ b/security/security.c
@@ -33,6 +33,8 @@
 /* Maximum number of letters for an LSM name string */
 #define SECURITY_NAME_MAX	10
 
+static ATOMIC_NOTIFIER_HEAD(lsm_notifier_chain);
+
 /* Boot-time LSM user choice */
 static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
 	CONFIG_DEFAULT_SECURITY;
@@ -98,6 +100,24 @@ int __init security_module_enable(const char *module)
 	return !strcmp(module, chosen_lsm);
 }
 
+int call_lsm_notifier(enum lsm_event event, void *data)
+{
+	return atomic_notifier_call_chain(&lsm_notifier_chain, event, data);
+}
+EXPORT_SYMBOL(call_lsm_notifier);
+
+int register_lsm_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&lsm_notifier_chain, nb);
+}
+EXPORT_SYMBOL(register_lsm_notifier);
+
+int unregister_lsm_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_unregister(&lsm_notifier_chain, nb);
+}
+EXPORT_SYMBOL(unregister_lsm_notifier);
+
 /*
  * Hook list operation macros.
  *
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 09fd610..2d7a7c1 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -170,6 +170,14 @@ static int selinux_netcache_avc_callback(u32 event)
 	return 0;
 }
 
+static int selinux_lsm_notifier_avc_callback(u32 event)
+{
+	if (event == AVC_CALLBACK_RESET)
+		call_lsm_notifier(LSM_POLICY_CHANGE, NULL);
+
+	return 0;
+}
+
 /*
  * initialise the security for the init task
  */
@@ -6325,6 +6333,9 @@ static __init int selinux_init(void)
 	if (avc_add_callback(selinux_netcache_avc_callback, AVC_CALLBACK_RESET))
 		panic("SELinux: Unable to register AVC netcache callback\n");
 
+	if (avc_add_callback(selinux_lsm_notifier_avc_callback, AVC_CALLBACK_RESET))
+		panic("SELinux: Unable to register AVC LSM notifier callback\n");
+
 	if (selinux_enforcing)
 		printk(KERN_DEBUG "SELinux:  Starting in enforcing mode\n");
 	else
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 72c145d..d3f9192 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -177,6 +177,8 @@ static ssize_t sel_write_enforce(struct file *file, const char __user *buf,
 			avc_ss_reset(0);
 		selnl_notify_setenforce(selinux_enforcing);
 		selinux_status_update_setenforce(selinux_enforcing);
+		if (!selinux_enforcing)
+			call_lsm_notifier(LSM_POLICY_CHANGE, NULL);
 	}
 	length = count;
 out:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 2/9] IB/core: Enforce PKey security on QPs
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
	sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
  Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.

Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.

When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.

Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.

In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.

These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.

1. When a QP is modified to a particular Port, PKey index or alternate
   path insert that QP into the appropriate lists.

2. Check permission to access the new settings.

3. If step 2 grants access attempt to modify the QP.

4a. If steps 2 and 3 succeed remove any prior associations.

4b. If ether fails remove the new setting associations.

If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.

Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.

If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.

To keep the security changes isolated a new file is used to hold security
related functionality.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
v2:
- Squashed LSM hook additions. Paul Moore
- Changed security blobs to void*. Paul Moore

v3:
- Change parameter order of pkey_access hook. Paul Moore
---
 drivers/infiniband/core/Makefile     |   3 +-
 drivers/infiniband/core/cache.c      |  21 +-
 drivers/infiniband/core/core_priv.h  |  77 +++++
 drivers/infiniband/core/device.c     |  33 ++
 drivers/infiniband/core/security.c   | 617 +++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_cmd.c |  20 +-
 drivers/infiniband/core/verbs.c      |  27 +-
 include/linux/lsm_hooks.h            |  27 ++
 include/linux/security.h             |  21 ++
 include/rdma/ib_verbs.h              |  48 +++
 security/Kconfig                     |   9 +
 security/security.c                  |  31 ++
 12 files changed, 925 insertions(+), 9 deletions(-)
 create mode 100644 drivers/infiniband/core/security.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..da4e2c1 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
-				multicast.o mad.o smi.o agent.o mad_rmpp.o
+				multicast.o mad.o smi.o agent.o mad_rmpp.o \
+				security.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index affc8ef..48eaeca 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -53,6 +53,7 @@ struct ib_update_work {
 	struct work_struct work;
 	struct ib_device  *device;
 	u8                 port_num;
+	bool		   enforce_security;
 };
 
 union ib_gid zgid;
@@ -1046,7 +1047,8 @@ int ib_get_cached_lmc(struct ib_device *device,
 EXPORT_SYMBOL(ib_get_cached_lmc);
 
 static void ib_cache_update(struct ib_device *device,
-			    u8                port)
+			    u8                port,
+			    bool	      enforce_security)
 {
 	struct ib_port_attr       *tprops = NULL;
 	struct ib_pkey_cache      *pkey_cache = NULL, *old_pkey_cache;
@@ -1134,6 +1136,11 @@ static void ib_cache_update(struct ib_device *device,
 							tprops->subnet_prefix;
 	write_unlock_irq(&device->cache.lock);
 
+	if (enforce_security)
+		ib_security_cache_change(device,
+					 port,
+					 tprops->subnet_prefix);
+
 	kfree(gid_cache);
 	kfree(old_pkey_cache);
 	kfree(tprops);
@@ -1150,7 +1157,9 @@ static void ib_cache_task(struct work_struct *_work)
 	struct ib_update_work *work =
 		container_of(_work, struct ib_update_work, work);
 
-	ib_cache_update(work->device, work->port_num);
+	ib_cache_update(work->device,
+			work->port_num,
+			work->enforce_security);
 	kfree(work);
 }
 
@@ -1171,6 +1180,12 @@ static void ib_cache_event(struct ib_event_handler *handler,
 			INIT_WORK(&work->work, ib_cache_task);
 			work->device   = event->device;
 			work->port_num = event->element.port_num;
+			if (event->event == IB_EVENT_PKEY_CHANGE ||
+			    event->event == IB_EVENT_GID_CHANGE)
+				work->enforce_security = true;
+			else
+				work->enforce_security = false;
+
 			queue_work(ib_wq, &work->work);
 		}
 	}
@@ -1211,7 +1226,7 @@ int ib_cache_setup_one(struct ib_device *device)
 		return err;
 
 	for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p)
-		ib_cache_update(device, p + rdma_start_port(device));
+		ib_cache_update(device, p + rdma_start_port(device), true);
 
 	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
 			      device, ib_cache_event);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index ce826e4..68e3de0 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,6 +38,14 @@
 
 #include <rdma/ib_verbs.h>
 
+struct pkey_index_qp_list {
+	struct list_head    pkey_index_list;
+	u16                 pkey_index;
+	/* Lock to hold while iterating the qp_list. */
+	spinlock_t          qp_list_lock;
+	struct list_head    qp_list;
+};
+
 #if IS_ENABLED(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS)
 int cma_configfs_init(void);
 void cma_configfs_exit(void);
@@ -156,4 +164,73 @@ int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
 int ib_get_cached_subnet_prefix(struct ib_device *device,
 				u8                port_num,
 				u64              *sn_pfx);
+
+#ifdef CONFIG_SECURITY_INFINIBAND
+void ib_security_destroy_port_pkey_list(struct ib_device *device);
+
+void ib_security_cache_change(struct ib_device *device,
+			      u8 port_num,
+			      u64 subnet_prefix);
+
+int ib_security_modify_qp(struct ib_qp *qp,
+			  struct ib_qp_attr *qp_attr,
+			  int qp_attr_mask,
+			  struct ib_udata *udata);
+
+int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev);
+void ib_destroy_qp_security_begin(struct ib_qp_security *sec);
+void ib_destroy_qp_security_abort(struct ib_qp_security *sec);
+void ib_destroy_qp_security_end(struct ib_qp_security *sec);
+int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev);
+void ib_close_shared_qp_security(struct ib_qp_security *sec);
+#else
+static inline void ib_security_destroy_port_pkey_list(struct ib_device *device)
+{
+}
+
+static inline void ib_security_cache_change(struct ib_device *device,
+					    u8 port_num,
+					    u64 subnet_prefix)
+{
+}
+
+static inline int ib_security_modify_qp(struct ib_qp *qp,
+					struct ib_qp_attr *qp_attr,
+					int qp_attr_mask,
+					struct ib_udata *udata)
+{
+	return qp->device->modify_qp(qp->real_qp,
+				     qp_attr,
+				     qp_attr_mask,
+				     udata);
+}
+
+static inline int ib_create_qp_security(struct ib_qp *qp,
+					struct ib_device *dev)
+{
+	return 0;
+}
+
+static inline void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
+{
+}
+
+static inline void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
+{
+}
+
+static inline void ib_destroy_qp_security_end(struct ib_qp_security *sec)
+{
+}
+
+static inline int ib_open_shared_qp_security(struct ib_qp *qp,
+					     struct ib_device *dev)
+{
+	return 0;
+}
+
+static inline void ib_close_shared_qp_security(struct ib_qp_security *sec)
+{
+}
+#endif
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 760ef60..5b42e83 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -320,6 +320,30 @@ void ib_get_device_fw_str(struct ib_device *dev, char *str, size_t str_len)
 }
 EXPORT_SYMBOL(ib_get_device_fw_str);
 
+static int setup_port_pkey_list(struct ib_device *device)
+{
+	int i;
+
+	/**
+	 * device->port_pkey_list is indexed directly by the port number,
+	 * Therefore it is declared as a 1 based array with potential empty
+	 * slots at the beginning.
+	 */
+	device->port_pkey_list = kzalloc(sizeof(*device->port_pkey_list)
+					 * (rdma_end_port(device) + 1),
+					 GFP_KERNEL);
+
+	if (!device->port_pkey_list)
+		return -ENOMEM;
+
+	for (i = 0; i < (rdma_end_port(device) + 1); i++) {
+		spin_lock_init(&device->port_pkey_list[i].list_lock);
+		INIT_LIST_HEAD(&device->port_pkey_list[i].pkey_list);
+	}
+
+	return 0;
+}
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -357,6 +381,12 @@ int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
+	ret = setup_port_pkey_list(device);
+	if (ret) {
+		dev_warn(device->dma_device, "Couldn't create per port_pkey_list\n");
+		goto out;
+	}
+
 	ret = ib_cache_setup_one(device);
 	if (ret) {
 		pr_warn("Couldn't set up InfiniBand P_Key/GID cache\n");
@@ -427,6 +457,9 @@ void ib_unregister_device(struct ib_device *device)
 	ib_device_unregister_sysfs(device);
 	ib_cache_cleanup_one(device);
 
+	ib_security_destroy_port_pkey_list(device);
+	kfree(device->port_pkey_list);
+
 	down_write(&lists_rwsem);
 	spin_lock_irqsave(&device->client_data_lock, flags);
 	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
new file mode 100644
index 0000000..45800dd
--- /dev/null
+++ b/drivers/infiniband/core/security.c
@@ -0,0 +1,617 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef CONFIG_SECURITY_INFINIBAND
+
+#include <linux/security.h>
+#include <linux/completion.h>
+#include <linux/list.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_cache.h>
+#include "core_priv.h"
+
+static struct pkey_index_qp_list *get_pkey_idx_qp_list(struct ib_port_pkey *pp)
+{
+	struct pkey_index_qp_list *pkey = NULL;
+	struct pkey_index_qp_list *tmp_pkey;
+	struct ib_device *dev = pp->sec->dev;
+
+	spin_lock(&dev->port_pkey_list[pp->port_num].list_lock);
+	list_for_each_entry(tmp_pkey,
+			    &dev->port_pkey_list[pp->port_num].pkey_list,
+			    pkey_index_list) {
+		if (tmp_pkey->pkey_index == pp->pkey_index) {
+			pkey = tmp_pkey;
+			break;
+		}
+	}
+	spin_unlock(&dev->port_pkey_list[pp->port_num].list_lock);
+	return pkey;
+}
+
+static int get_pkey_and_subnet_prefix(struct ib_port_pkey *pp,
+				      u16 *pkey,
+				      u64 *subnet_prefix)
+{
+	struct ib_device *dev = pp->sec->dev;
+	int ret;
+
+	ret = ib_get_cached_pkey(dev, pp->port_num, pp->pkey_index, pkey);
+	if (ret)
+		return ret;
+
+	ret = ib_get_cached_subnet_prefix(dev, pp->port_num, subnet_prefix);
+
+	return ret;
+}
+
+static int enforce_qp_pkey_security(u16 pkey,
+				    u64 subnet_prefix,
+				    struct ib_qp_security *qp_sec)
+{
+	struct ib_qp_security *shared_qp_sec;
+	int ret;
+
+	ret = security_ib_pkey_access(qp_sec->security, subnet_prefix, pkey);
+	if (ret)
+		return ret;
+
+	if (qp_sec->qp == qp_sec->qp->real_qp) {
+		list_for_each_entry(shared_qp_sec,
+				    &qp_sec->shared_qp_list,
+				    shared_qp_list) {
+			ret = security_ib_pkey_access(shared_qp_sec->security,
+						      subnet_prefix,
+						      pkey);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
+/* The caller of this function must hold the QP security
+ * mutex of the QP of the security structure in *pps.
+ *
+ * It takes separate ports_pkeys and security structure
+ * because in some cases the pps will be for a new settings
+ * or the pps will be for the real QP and security structure
+ * will be for a shared QP.
+ */
+static int check_qp_port_pkey_settings(struct ib_ports_pkeys *pps,
+				       struct ib_qp_security *sec)
+{
+	u64 subnet_prefix;
+	u16 pkey;
+	int ret = 0;
+
+	if (!pps)
+		return 0;
+
+	if (pps->main.state != IB_PORT_PKEY_NOT_VALID) {
+		get_pkey_and_subnet_prefix(&pps->main,
+					   &pkey,
+					   &subnet_prefix);
+
+		ret = enforce_qp_pkey_security(pkey,
+					       subnet_prefix,
+					       sec);
+	}
+	if (ret)
+		goto out;
+
+	if (pps->alt.state != IB_PORT_PKEY_NOT_VALID) {
+		get_pkey_and_subnet_prefix(&pps->alt,
+					   &pkey,
+					   &subnet_prefix);
+
+		ret = enforce_qp_pkey_security(pkey,
+					       subnet_prefix,
+					       sec);
+	}
+
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+/* The caller of this function must hold the QP security
+ * mutex.
+ */
+static void qp_to_error(struct ib_qp_security *sec)
+{
+	struct ib_qp_security *shared_qp_sec;
+	struct ib_qp_attr attr = {
+		.qp_state = IB_QPS_ERR
+	};
+	struct ib_event event = {
+		.event = IB_EVENT_QP_FATAL
+	};
+
+	/* If the QP is in the process of being destroyed
+	 * the qp pointer in the security structure is
+	 * undefined.  It cannot be modified now.
+	 */
+	if (sec->destroying)
+		return;
+
+	ib_modify_qp(sec->qp,
+		     &attr,
+		     IB_QP_STATE);
+
+	if (sec->qp->event_handler && sec->qp->qp_context) {
+		event.element.qp = sec->qp;
+		sec->qp->event_handler(&event,
+				       sec->qp->qp_context);
+	}
+
+	list_for_each_entry(shared_qp_sec,
+			    &sec->shared_qp_list,
+			    shared_qp_list) {
+		struct ib_qp *qp = shared_qp_sec->qp;
+
+		if (qp->event_handler && qp->qp_context) {
+			event.element.qp = qp;
+			event.device = qp->device;
+			qp->event_handler(&event,
+					  qp->qp_context);
+		}
+	}
+}
+
+static inline void check_pkey_qps(struct pkey_index_qp_list *pkey,
+				  struct ib_device *device,
+				  u8 port_num,
+				  u64 subnet_prefix)
+{
+	struct ib_port_pkey *pp, *tmp_pp;
+	bool comp;
+	LIST_HEAD(to_error_list);
+	u16 pkey_val;
+
+	if (!ib_get_cached_pkey(device,
+				port_num,
+				pkey->pkey_index,
+				&pkey_val)) {
+		spin_lock(&pkey->qp_list_lock);
+		list_for_each_entry(pp, &pkey->qp_list, qp_list) {
+			if (atomic_read(&pp->sec->error_list_count))
+				continue;
+
+			if (enforce_qp_pkey_security(pkey_val,
+						     subnet_prefix,
+						     pp->sec)) {
+				atomic_inc(&pp->sec->error_list_count);
+				list_add(&pp->to_error_list,
+					 &to_error_list);
+			}
+		}
+		spin_unlock(&pkey->qp_list_lock);
+	}
+
+	list_for_each_entry_safe(pp,
+				 tmp_pp,
+				 &to_error_list,
+				 to_error_list) {
+		mutex_lock(&pp->sec->mutex);
+		qp_to_error(pp->sec);
+		list_del(&pp->to_error_list);
+		atomic_dec(&pp->sec->error_list_count);
+		comp = pp->sec->destroying;
+		mutex_unlock(&pp->sec->mutex);
+
+		if (comp)
+			complete(&pp->sec->error_complete);
+	}
+}
+
+/* The caller of this function must hold the QP security
+ * mutex.
+ */
+static int port_pkey_list_insert(struct ib_port_pkey *pp)
+{
+	struct pkey_index_qp_list *tmp_pkey;
+	struct pkey_index_qp_list *pkey;
+	struct ib_device *dev;
+	u8 port_num = pp->port_num;
+	int ret = 0;
+
+	if (pp->state != IB_PORT_PKEY_VALID)
+		return 0;
+
+	dev = pp->sec->dev;
+
+	pkey = get_pkey_idx_qp_list(pp);
+
+	if (!pkey) {
+		bool found = false;
+
+		pkey = kzalloc(sizeof(*pkey), GFP_KERNEL);
+		if (!pkey)
+			return -ENOMEM;
+
+		spin_lock(&dev->port_pkey_list[port_num].list_lock);
+		/* Check for the PKey again.  A racing process may
+		 * have created it.
+		 */
+		list_for_each_entry(tmp_pkey,
+				    &dev->port_pkey_list[port_num].pkey_list,
+				    pkey_index_list) {
+			if (tmp_pkey->pkey_index == pp->pkey_index) {
+				kfree(pkey);
+				pkey = tmp_pkey;
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			pkey->pkey_index = pp->pkey_index;
+			spin_lock_init(&pkey->qp_list_lock);
+			INIT_LIST_HEAD(&pkey->qp_list);
+			list_add(&pkey->pkey_index_list,
+				 &dev->port_pkey_list[port_num].pkey_list);
+		}
+		spin_unlock(&dev->port_pkey_list[port_num].list_lock);
+	}
+
+	spin_lock(&pkey->qp_list_lock);
+	list_add(&pp->qp_list, &pkey->qp_list);
+	spin_unlock(&pkey->qp_list_lock);
+
+	pp->state = IB_PORT_PKEY_LISTED;
+
+	return ret;
+}
+
+/* The caller of this function must hold the QP security
+ * mutex.
+ */
+static void port_pkey_list_remove(struct ib_port_pkey *pp)
+{
+	struct pkey_index_qp_list *pkey;
+
+	if (pp->state != IB_PORT_PKEY_LISTED)
+		return;
+
+	pkey = get_pkey_idx_qp_list(pp);
+
+	spin_lock(&pkey->qp_list_lock);
+	list_del(&pp->qp_list);
+	spin_unlock(&pkey->qp_list_lock);
+
+	/* The setting may still be valid, i.e. after
+	 * a destroy has failed for example.
+	 */
+	pp->state = IB_PORT_PKEY_VALID;
+}
+
+static void destroy_qp_security(struct ib_qp_security *sec)
+{
+	security_ib_free_security(sec->security);
+	kfree(sec->ports_pkeys);
+	kfree(sec);
+}
+
+/* The caller of this function must hold the QP security
+ * mutex.
+ */
+static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
+					  const struct ib_qp_attr *qp_attr,
+					  int qp_attr_mask)
+{
+	struct ib_ports_pkeys *new_pps;
+	struct ib_ports_pkeys *qp_pps = qp->qp_sec->ports_pkeys;
+
+	new_pps = kzalloc(sizeof(*new_pps), GFP_KERNEL);
+	if (!new_pps)
+		return NULL;
+
+	if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) {
+		if (!qp_pps) {
+			new_pps->main.port_num = qp_attr->port_num;
+			new_pps->main.pkey_index = qp_attr->pkey_index;
+		} else {
+			new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ?
+						  qp_attr->port_num :
+						  qp_pps->main.port_num;
+
+			new_pps->main.pkey_index =
+					(qp_attr_mask & IB_QP_PKEY_INDEX) ?
+					 qp_attr->pkey_index :
+					 qp_pps->main.pkey_index;
+		}
+		new_pps->main.state = IB_PORT_PKEY_VALID;
+	} else if (qp_pps) {
+		new_pps->main.port_num = qp_pps->main.port_num;
+		new_pps->main.pkey_index = qp_pps->main.pkey_index;
+		if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
+			new_pps->main.state = IB_PORT_PKEY_VALID;
+	}
+
+	if (qp_attr_mask & IB_QP_ALT_PATH) {
+		new_pps->alt.port_num = qp_attr->alt_port_num;
+		new_pps->alt.pkey_index = qp_attr->alt_pkey_index;
+		new_pps->alt.state = IB_PORT_PKEY_VALID;
+	} else if (qp_pps) {
+		new_pps->alt.port_num = qp_pps->alt.port_num;
+		new_pps->alt.pkey_index = qp_pps->alt.pkey_index;
+		if (qp_pps->alt.state != IB_PORT_PKEY_NOT_VALID)
+			new_pps->alt.state = IB_PORT_PKEY_VALID;
+	}
+
+	new_pps->main.sec = qp->qp_sec;
+	new_pps->alt.sec = qp->qp_sec;
+	return new_pps;
+}
+
+int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev)
+{
+	struct ib_qp *real_qp = qp->real_qp;
+	int ret;
+
+	ret = ib_create_qp_security(qp, dev);
+
+	if (ret)
+		goto out;
+
+	mutex_lock(&real_qp->qp_sec->mutex);
+	ret = check_qp_port_pkey_settings(real_qp->qp_sec->ports_pkeys,
+					  qp->qp_sec);
+
+	if (ret)
+		goto ret;
+
+	if (qp != real_qp)
+		list_add(&qp->qp_sec->shared_qp_list,
+			 &real_qp->qp_sec->shared_qp_list);
+ret:
+	mutex_unlock(&real_qp->qp_sec->mutex);
+	if (ret)
+		destroy_qp_security(qp->qp_sec);
+
+out:
+	return ret;
+}
+
+void ib_close_shared_qp_security(struct ib_qp_security *sec)
+{
+	struct ib_qp *real_qp = sec->qp->real_qp;
+
+	mutex_lock(&real_qp->qp_sec->mutex);
+	list_del(&sec->shared_qp_list);
+	mutex_unlock(&real_qp->qp_sec->mutex);
+
+	destroy_qp_security(sec);
+}
+
+int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
+{
+	int ret;
+
+	qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL);
+	if (!qp->qp_sec)
+		return -ENOMEM;
+
+	qp->qp_sec->qp = qp;
+	qp->qp_sec->dev = dev;
+	mutex_init(&qp->qp_sec->mutex);
+	INIT_LIST_HEAD(&qp->qp_sec->shared_qp_list);
+	atomic_set(&qp->qp_sec->error_list_count, 0);
+	init_completion(&qp->qp_sec->error_complete);
+	ret = security_ib_alloc_security(&qp->qp_sec->security);
+	if (ret)
+		kfree(qp->qp_sec);
+
+	return ret;
+}
+EXPORT_SYMBOL(ib_create_qp_security);
+
+void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
+{
+	mutex_lock(&sec->mutex);
+
+	/* Remove the QP from the lists so it won't get added to
+	 * a to_error_list during the destroy process.
+	 */
+	if (sec->ports_pkeys) {
+		port_pkey_list_remove(&sec->ports_pkeys->main);
+		port_pkey_list_remove(&sec->ports_pkeys->alt);
+	}
+
+	/* If the QP is already in one or more of those lists
+	 * the destroying flag will ensure the to error flow
+	 * doesn't operate on an undefined QP.
+	 */
+	sec->destroying = true;
+
+	/* Record the error list count to know how many completions
+	 * to wait for.
+	 */
+	sec->error_comps_pending = atomic_read(&sec->error_list_count);
+
+	mutex_unlock(&sec->mutex);
+}
+
+void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
+{
+	int ret;
+	int i;
+
+	/* If a concurrent cache update is in progress this
+	 * QP security could be marked for an error state
+	 * transition.  Wait for this to complete.
+	 */
+	for (i = 0; i < sec->error_comps_pending; i++)
+		wait_for_completion(&sec->error_complete);
+
+	mutex_lock(&sec->mutex);
+	sec->destroying = false;
+
+	/* Restore the position in the lists and verify
+	 * access is still allowed in case a cache update
+	 * occurred while attempting to destroy.
+	 *
+	 * Because these setting were listed already
+	 * and removed during ib_destroy_qp_security_begin
+	 * we know the pkey_index_qp_list for the PKey
+	 * already exists so port_pkey_list_insert won't fail.
+	 */
+	if (sec->ports_pkeys) {
+		port_pkey_list_insert(&sec->ports_pkeys->main);
+		port_pkey_list_insert(&sec->ports_pkeys->alt);
+	}
+
+	ret = check_qp_port_pkey_settings(sec->ports_pkeys, sec);
+	if (ret)
+		qp_to_error(sec);
+
+	mutex_unlock(&sec->mutex);
+}
+
+void ib_destroy_qp_security_end(struct ib_qp_security *sec)
+{
+	int i;
+
+	/* If a concurrent cache update is occurring we must
+	 * wait until this QP security structure is processed
+	 * in the QP to error flow before destroying it because
+	 * the to_error_list is in use.
+	 */
+	for (i = 0; i < sec->error_comps_pending; i++)
+		wait_for_completion(&sec->error_complete);
+
+	destroy_qp_security(sec);
+}
+
+void ib_security_cache_change(struct ib_device *device,
+			      u8 port_num,
+			      u64 subnet_prefix)
+{
+	struct pkey_index_qp_list *pkey;
+
+	list_for_each_entry(pkey,
+			    &device->port_pkey_list[port_num].pkey_list,
+			    pkey_index_list) {
+		check_pkey_qps(pkey,
+			       device,
+			       port_num,
+			       subnet_prefix);
+	}
+}
+
+void ib_security_destroy_port_pkey_list(struct ib_device *device)
+{
+	struct pkey_index_qp_list *pkey, *tmp_pkey;
+	int i;
+
+	for (i = rdma_start_port(device); i <= rdma_end_port(device); i++) {
+		spin_lock(&device->port_pkey_list[i].list_lock);
+		list_for_each_entry_safe(pkey,
+					 tmp_pkey,
+					 &device->port_pkey_list[i].pkey_list,
+					 pkey_index_list) {
+			list_del(&pkey->pkey_index_list);
+			kfree(pkey);
+		}
+		spin_unlock(&device->port_pkey_list[i].list_lock);
+	}
+}
+
+int ib_security_modify_qp(struct ib_qp *qp,
+			  struct ib_qp_attr *qp_attr,
+			  int qp_attr_mask,
+			  struct ib_udata *udata)
+{
+	int ret = 0;
+	struct ib_ports_pkeys *tmp_pps;
+	struct ib_ports_pkeys *new_pps;
+	bool special_qp = (qp->qp_type == IB_QPT_SMI ||
+			   qp->qp_type == IB_QPT_GSI);
+	bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
+			   (qp_attr_mask & IB_QP_ALT_PATH));
+
+	if (pps_change && !special_qp) {
+		mutex_lock(&qp->qp_sec->mutex);
+		new_pps = get_new_pps(qp,
+				      qp_attr,
+				      qp_attr_mask);
+
+		/* Add this QP to the lists for the new port
+		 * and pkey settings before checking for permission
+		 * in case there is a concurrent cache update
+		 * occurring.  Walking the list for a cache change
+		 * doesn't acquire the security mutex unless it's
+		 * sending the QP to error.
+		 */
+		ret = port_pkey_list_insert(&new_pps->main);
+
+		if (!ret)
+			ret = port_pkey_list_insert(&new_pps->alt);
+
+		if (!ret)
+			ret = check_qp_port_pkey_settings(new_pps,
+							  qp->qp_sec);
+	}
+
+	if (!ret)
+		ret = qp->device->modify_qp(qp->real_qp,
+					    qp_attr,
+					    qp_attr_mask,
+					    udata);
+
+	if (pps_change && !special_qp) {
+		/* Clean up the lists and free the appropriate
+		 * ports_pkeys structure.
+		 */
+		if (ret) {
+			tmp_pps = new_pps;
+		} else {
+			tmp_pps = qp->qp_sec->ports_pkeys;
+			qp->qp_sec->ports_pkeys = new_pps;
+		}
+
+		if (tmp_pps) {
+			port_pkey_list_remove(&tmp_pps->main);
+			port_pkey_list_remove(&tmp_pps->alt);
+		}
+		kfree(tmp_pps);
+		mutex_unlock(&qp->qp_sec->mutex);
+	}
+	return ret;
+}
+EXPORT_SYMBOL(ib_security_modify_qp);
+
+#endif /* CONFIG_SECURITY_INFINIBAND */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..ceb9154 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -37,7 +37,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
-
+#include <linux/security.h>
 #include <asm/uaccess.h>
 
 #include "uverbs.h"
@@ -1915,6 +1915,10 @@ static int create_qp(struct ib_uverbs_file *file,
 	}
 
 	if (cmd->qp_type != IB_QPT_XRC_TGT) {
+		ret = ib_create_qp_security(qp, device);
+		if (ret)
+			goto err_destroy;
+
 		qp->real_qp	  = qp;
 		qp->device	  = device;
 		qp->pd		  = pd;
@@ -2405,10 +2409,18 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 		ret = ib_resolve_eth_dmac(qp, attr, &cmd.attr_mask);
 		if (ret)
 			goto release_qp;
-		ret = qp->device->modify_qp(qp, attr,
-			modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
+
+		ret = ib_security_modify_qp(qp,
+					    attr,
+					    modify_qp_mask(qp->qp_type,
+							   cmd.attr_mask),
+					    &udata);
 	} else {
-		ret = ib_modify_qp(qp, attr, modify_qp_mask(qp->qp_type, cmd.attr_mask));
+		ret = ib_security_modify_qp(qp,
+					    attr,
+					    modify_qp_mask(qp->qp_type,
+							   cmd.attr_mask),
+					    NULL);
 	}
 
 	if (ret)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8368764..9845e37 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -44,6 +44,7 @@
 #include <linux/in.h>
 #include <linux/in6.h>
 #include <net/addrconf.h>
+#include <linux/security.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
@@ -708,12 +709,20 @@ static struct ib_qp *__ib_open_qp(struct ib_qp *real_qp,
 {
 	struct ib_qp *qp;
 	unsigned long flags;
+	int err;
 
 	qp = kzalloc(sizeof *qp, GFP_KERNEL);
 	if (!qp)
 		return ERR_PTR(-ENOMEM);
 
 	qp->real_qp = real_qp;
+	err = ib_open_shared_qp_security(qp, real_qp->device);
+	if (err) {
+		kfree(qp);
+		return ERR_PTR(err);
+	}
+
+	qp->real_qp = real_qp;
 	atomic_inc(&real_qp->usecnt);
 	qp->device = real_qp->device;
 	qp->event_handler = event_handler;
@@ -799,6 +808,12 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	if (IS_ERR(qp))
 		return qp;
 
+	ret = ib_create_qp_security(qp, device);
+	if (ret) {
+		ib_destroy_qp(qp);
+		return ERR_PTR(ret);
+	}
+
 	qp->device     = device;
 	qp->real_qp    = qp;
 	qp->uobject    = NULL;
@@ -1257,7 +1272,7 @@ int ib_modify_qp(struct ib_qp *qp,
 	if (ret)
 		return ret;
 
-	return qp->device->modify_qp(qp->real_qp, qp_attr, qp_attr_mask, NULL);
+	return ib_security_modify_qp(qp->real_qp, qp_attr, qp_attr_mask, NULL);
 }
 EXPORT_SYMBOL(ib_modify_qp);
 
@@ -1286,6 +1301,7 @@ int ib_close_qp(struct ib_qp *qp)
 	spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);
 
 	atomic_dec(&real_qp->usecnt);
+	ib_close_shared_qp_security(qp->qp_sec);
 	kfree(qp);
 
 	return 0;
@@ -1326,6 +1342,7 @@ int ib_destroy_qp(struct ib_qp *qp)
 	struct ib_cq *scq, *rcq;
 	struct ib_srq *srq;
 	struct ib_rwq_ind_table *ind_tbl;
+	struct ib_qp_security *sec;
 	int ret;
 
 	WARN_ON_ONCE(qp->mrs_used > 0);
@@ -1341,6 +1358,9 @@ int ib_destroy_qp(struct ib_qp *qp)
 	rcq  = qp->recv_cq;
 	srq  = qp->srq;
 	ind_tbl = qp->rwq_ind_tbl;
+	sec  = qp->qp_sec;
+	if (sec)
+		ib_destroy_qp_security_begin(sec);
 
 	if (!qp->uobject)
 		rdma_rw_cleanup_mrs(qp);
@@ -1357,6 +1377,11 @@ int ib_destroy_qp(struct ib_qp *qp)
 			atomic_dec(&srq->usecnt);
 		if (ind_tbl)
 			atomic_dec(&ind_tbl->usecnt);
+		if (sec)
+			ib_destroy_qp_security_end(sec);
+	} else {
+		if (sec)
+			ib_destroy_qp_security_abort(sec);
 	}
 
 	return ret;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 558adfa..d576efc 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -8,6 +8,7 @@
  * Copyright (C) 2001 Silicon Graphics, Inc. (Trust Technology Group)
  * Copyright (C) 2015 Intel Corporation.
  * Copyright (C) 2015 Casey Schaufler <casey-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
+ * Copyright (C) 2016 Mellanox Techonologies
  *
  *	This program is free software; you can redistribute it and/or modify
  *	it under the terms of the GNU General Public License as published by
@@ -903,6 +904,21 @@
  *	associated with the TUN device's security structure.
  *	@security pointer to the TUN devices's security structure.
  *
+ * Security hooks for Infiniband
+ *
+ * @ib_pkey_access:
+ *	Check permission to access a pkey when modifing a QP.
+ *	@subnet_prefix the subnet prefix of the port being used.
+ *	@pkey the pkey to be accessed.
+ *	@sec pointer to a security structure.
+ * @ib_alloc_security:
+ *	Allocate a security structure for Infiniband objects.
+ *	@sec pointer to a security structure pointer.
+ *	Returns 0 on success, non-zero on failure
+ * @ib_free_security:
+ *	Deallocate an Infiniband security structure.
+ *	@sec contains the security structure to be freed.
+ *
  * Security hooks for XFRM operations.
  *
  * @xfrm_policy_alloc_security:
@@ -1611,6 +1627,12 @@ union security_list_options {
 	int (*tun_dev_open)(void *security);
 #endif	/* CONFIG_SECURITY_NETWORK */
 
+#ifdef CONFIG_SECURITY_INFINIBAND
+	int (*ib_pkey_access)(void *sec, u64 subnet_prefix, u16 pkey);
+	int (*ib_alloc_security)(void **sec);
+	void (*ib_free_security)(void *sec);
+#endif	/* CONFIG_SECURITY_INFINIBAND */
+
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 	int (*xfrm_policy_alloc_security)(struct xfrm_sec_ctx **ctxp,
 					  struct xfrm_user_sec_ctx *sec_ctx,
@@ -1841,6 +1863,11 @@ struct security_hook_heads {
 	struct list_head tun_dev_attach;
 	struct list_head tun_dev_open;
 #endif	/* CONFIG_SECURITY_NETWORK */
+#ifdef CONFIG_SECURITY_INFINIBAND
+	struct list_head ib_pkey_access;
+	struct list_head ib_alloc_security;
+	struct list_head ib_free_security;
+#endif	/* CONFIG_SECURITY_INFINIBAND */
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 	struct list_head xfrm_policy_alloc_security;
 	struct list_head xfrm_policy_clone_security;
diff --git a/include/linux/security.h b/include/linux/security.h
index c2125e9..342ca4c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -6,6 +6,7 @@
  * Copyright (C) 2001 Networks Associates Technology, Inc <ssmalley-M06CiZnz2FM@public.gmane.org>
  * Copyright (C) 2001 James Morris <jmorris-G2x6lROWQUcJY7gZg3T8ig@public.gmane.org>
  * Copyright (C) 2001 Silicon Graphics, Inc. (Trust Technology Group)
+ * Copyright (C) 2016 Mellanox Techonologies
  *
  *	This program is free software; you can redistribute it and/or modify
  *	it under the terms of the GNU General Public License as published by
@@ -1393,6 +1394,26 @@ static inline int security_tun_dev_open(void *security)
 }
 #endif	/* CONFIG_SECURITY_NETWORK */
 
+#ifdef CONFIG_SECURITY_INFINIBAND
+int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey);
+int security_ib_alloc_security(void **sec);
+void security_ib_free_security(void *sec);
+#else	/* CONFIG_SECURITY_INFINIBAND */
+static inline int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey)
+{
+	return 0;
+}
+
+static inline int security_ib_alloc_security(void **sec)
+{
+	return 0;
+}
+
+static inline void security_ib_free_security(void *sec)
+{
+}
+#endif	/* CONFIG_SECURITY_INFINIBAND */
+
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 
 int security_xfrm_policy_alloc(struct xfrm_sec_ctx **ctxp,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index db178fd..1143586 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1511,6 +1511,45 @@ struct ib_rwq_ind_table_init_attr {
 	struct ib_wq	**ind_tbl;
 };
 
+enum port_pkey_state {
+	IB_PORT_PKEY_NOT_VALID = 0,
+	IB_PORT_PKEY_VALID = 1,
+	IB_PORT_PKEY_LISTED = 2,
+};
+
+struct ib_qp_security;
+
+struct ib_port_pkey {
+	enum port_pkey_state	state;
+	u16			pkey_index;
+	u8			port_num;
+	struct list_head	qp_list;
+	struct list_head	to_error_list;
+	struct ib_qp_security  *sec;
+};
+
+struct ib_ports_pkeys {
+	struct ib_port_pkey	main;
+	struct ib_port_pkey	alt;
+};
+
+struct ib_qp_security {
+	struct ib_qp	       *qp;
+	struct ib_device       *dev;
+	/* Hold this mutex when changing port and pkey settings. */
+	struct mutex		mutex;
+	struct ib_ports_pkeys  *ports_pkeys;
+	/* A list of all open shared QP handles.  Required to enforce security
+	 * properly for all users of a shared QP.
+	 */
+	struct list_head        shared_qp_list;
+	void                   *security;
+	bool			destroying;
+	atomic_t		error_list_count;
+	struct completion	error_complete;
+	int			error_comps_pending;
+};
+
 /*
  * @max_write_sge: Maximum SGE elements per RDMA WRITE request.
  * @max_read_sge:  Maximum SGE elements per RDMA READ request.
@@ -1540,6 +1579,7 @@ struct ib_qp {
 	u32			max_read_sge;
 	enum ib_qp_type		qp_type;
 	struct ib_rwq_ind_table *rwq_ind_tbl;
+	struct ib_qp_security  *qp_sec;
 };
 
 struct ib_mr {
@@ -1820,6 +1860,12 @@ struct ib_port_immutable {
 	u32                           max_mad_size;
 };
 
+struct ib_port_pkey_list {
+	/* Lock to hold while modifying the list. */
+	spinlock_t		      list_lock;
+	struct list_head	      pkey_list;
+};
+
 struct ib_device {
 	struct device                *dma_device;
 
@@ -1842,6 +1888,8 @@ struct ib_device {
 
 	int			      num_comp_vectors;
 
+	struct ib_port_pkey_list     *port_pkey_list;
+
 	struct iw_cm_verbs	     *iwcm;
 
 	/**
diff --git a/security/Kconfig b/security/Kconfig
index 118f454..85cf893 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -49,6 +49,15 @@ config SECURITY_NETWORK
 	  implement socket and networking access controls.
 	  If you are unsure how to answer this question, answer N.
 
+config SECURITY_INFINIBAND
+	bool "Infiniband Security Hooks"
+	depends on SECURITY && INFINIBAND
+	help
+	  This enables the Infiniband security hooks.
+	  If enabled, a security module can use these hooks to
+	  implement Infiniband access controls.
+	  If you are unsure how to answer this question, answer N.
+
 config SECURITY_NETWORK_XFRM
 	bool "XFRM (IPSec) Networking Security Hooks"
 	depends on XFRM && SECURITY_NETWORK
diff --git a/security/security.c b/security/security.c
index f825304..7d3bf2f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2001 WireX Communications, Inc <chris-ZMHXrckZAt0AvxtiuMwx3w@public.gmane.org>
  * Copyright (C) 2001-2002 Greg Kroah-Hartman <greg-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
  * Copyright (C) 2001 Networks Associates Technology, Inc <ssmalley-M06CiZnz2FM@public.gmane.org>
+ * Copyright (C) 2016 Mellanox Technologies
  *
  *	This program is free software; you can redistribute it and/or modify
  *	it under the terms of the GNU General Public License as published by
@@ -1441,6 +1442,27 @@ EXPORT_SYMBOL(security_tun_dev_open);
 
 #endif	/* CONFIG_SECURITY_NETWORK */
 
+#ifdef CONFIG_SECURITY_INFINIBAND
+
+int security_ib_pkey_access(void *sec, u64 subnet_prefix, u16 pkey)
+{
+	return call_int_hook(ib_pkey_access, 0, sec, subnet_prefix, pkey);
+}
+EXPORT_SYMBOL(security_ib_pkey_access);
+
+int security_ib_alloc_security(void **sec)
+{
+	return call_int_hook(ib_alloc_security, 0, sec);
+}
+EXPORT_SYMBOL(security_ib_alloc_security);
+
+void security_ib_free_security(void *sec)
+{
+	call_void_hook(ib_free_security, sec);
+}
+EXPORT_SYMBOL(security_ib_free_security);
+#endif	/* CONFIG_SECURITY_INFINIBAND */
+
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 
 int security_xfrm_policy_alloc(struct xfrm_sec_ctx **ctxp,
@@ -1898,6 +1920,15 @@ struct security_hook_heads security_hook_heads = {
 		LIST_HEAD_INIT(security_hook_heads.tun_dev_attach),
 	.tun_dev_open =	LIST_HEAD_INIT(security_hook_heads.tun_dev_open),
 #endif	/* CONFIG_SECURITY_NETWORK */
+
+#ifdef CONFIG_SECURITY_INFINIBAND
+	.ib_pkey_access = LIST_HEAD_INIT(security_hook_heads.ib_pkey_access),
+	.ib_alloc_security =
+		LIST_HEAD_INIT(security_hook_heads.ib_alloc_security),
+	.ib_free_security =
+		LIST_HEAD_INIT(security_hook_heads.ib_free_security),
+#endif	/* CONFIG_SECURITY_INFINIBAND */
+
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 	.xfrm_policy_alloc_security =
 		LIST_HEAD_INIT(security_hook_heads.xfrm_policy_alloc_security),
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v5 1/9] IB/core: IB cache enhancements to support Infiniband security
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens
In-Reply-To: <1479843638-67136-1-git-send-email-danielj@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Cache the subnet prefix and add a function to access it. Enforcing
security requires frequent queries of the subnet prefix and the pkeys in
the pkey table.

Also removed an unneded pr_warn about memory allocation failure.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

---
v2:
- In ib_get_cached_subnet_prefix wait to initialize p until after
  validation.  Yuval Shaia
---
 drivers/infiniband/core/cache.c     | 36 ++++++++++++++++++++++++++++++++++--
 drivers/infiniband/core/core_priv.h |  3 +++
 include/rdma/ib_verbs.h             |  1 +
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 1a2984c..affc8ef 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -934,6 +934,26 @@ int ib_get_cached_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_get_cached_pkey);
 
+int ib_get_cached_subnet_prefix(struct ib_device *device,
+				u8                port_num,
+				u64              *sn_pfx)
+{
+	unsigned long flags;
+	int p;
+
+	if (port_num < rdma_start_port(device) ||
+	    port_num > rdma_end_port(device))
+		return -EINVAL;
+
+	p = port_num - rdma_start_port(device);
+	read_lock_irqsave(&device->cache.lock, flags);
+	*sn_pfx = device->cache.subnet_prefix_cache[p];
+	read_unlock_irqrestore(&device->cache.lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_get_cached_subnet_prefix);
+
 int ib_find_cached_pkey(struct ib_device *device,
 			u8                port_num,
 			u16               pkey,
@@ -1110,6 +1130,8 @@ static void ib_cache_update(struct ib_device *device,
 
 	device->cache.lmc_cache[port - rdma_start_port(device)] = tprops->lmc;
 
+	device->cache.subnet_prefix_cache[port - rdma_start_port(device)] =
+							tprops->subnet_prefix;
 	write_unlock_irq(&device->cache.lock);
 
 	kfree(gid_cache);
@@ -1168,9 +1190,18 @@ int ib_cache_setup_one(struct ib_device *device)
 					  (rdma_end_port(device) -
 					   rdma_start_port(device) + 1),
 					  GFP_KERNEL);
+
+	device->cache.subnet_prefix_cache =
+		kcalloc((rdma_end_port(device) - rdma_start_port(device) + 1),
+			sizeof(*device->cache.subnet_prefix_cache),
+			GFP_KERNEL);
+
 	if (!device->cache.pkey_cache ||
-	    !device->cache.lmc_cache) {
-		pr_warn("Couldn't allocate cache for %s\n", device->name);
+	    !device->cache.lmc_cache ||
+	    !device->cache.subnet_prefix_cache) {
+		kfree(device->cache.pkey_cache);
+		kfree(device->cache.lmc_cache);
+		kfree(device->cache.subnet_prefix_cache);
 		return -ENOMEM;
 	}
 
@@ -1213,6 +1244,7 @@ void ib_cache_release_one(struct ib_device *device)
 	gid_table_release_one(device);
 	kfree(device->cache.pkey_cache);
 	kfree(device->cache.lmc_cache);
+	kfree(device->cache.subnet_prefix_cache);
 }
 
 void ib_cache_cleanup_one(struct ib_device *device)
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 19d499d..ce826e4 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -153,4 +153,7 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb,
 int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
 			     struct netlink_callback *cb);
 
+int ib_get_cached_subnet_prefix(struct ib_device *device,
+				u8                port_num,
+				u64              *sn_pfx);
 #endif /* _CORE_PRIV_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..db178fd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1761,6 +1761,7 @@ struct ib_cache {
 	struct ib_pkey_cache  **pkey_cache;
 	struct ib_gid_table   **gid_cache;
 	u8                     *lmc_cache;
+	u64                    *subnet_prefix_cache;
 };
 
 struct ib_dma_mapping_ops {
-- 
2.7.4


^ permalink raw reply related

* [PATCH v5 0/9] SELinux support for Infiniband RDMA
From: Dan Jurgens @ 2016-11-22 19:40 UTC (permalink / raw)
  To: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock
  Cc: selinux, linux-security-module, linux-rdma, yevgenyp,
	Daniel Jurgens

From: Daniel Jurgens <danielj@mellanox.com>

Infiniband applications access HW from user-space -- traffic is generated
directly by HW, bypassing the kernel. Consequently, Infiniband Partitions,
which are associated directly with HW transport endpoints, are a natural
choice for enforcing granular mandatory access control for Infiniband. QPs
may only send or receives packets tagged with the corresponding partition
key (PKey). The PKey is not a cryptographic key; it's a 16 bit number
identifying the partition.

Every Infiniband fabric is controlled by a central Subnet Manager (SM). The SM
provisions the partitions by assigning each port with the partitions it can
access. In addition, the SM tags each port with a subnet prefix, which
identifies the subnet. Determining which users are allowed to access which
partition keys on a given subnet forms an effective policy for isolating users
on the fabric. Any application that attempts to send traffic on a given subnet
is automatically subject to the policy, regardless of which device and port it
uses. SM software configures the subnet through a privileged Subnet Management
Interface (SMI), which is presented by each Infiniband port. Thus, the SMI must
also be controlled to prevent unauthorized changes to fabric configuration and
partitioning. 

To support access control for IB partitions and subnet management, security
contexts must be provided for two new types of objects - PKeys and IB ports.

A PKey label consists of a subnet prefix and a range of PKey values and is
similar to the labeling mechanism for netports. Each Infiniband port can reside
on a different subnet. So labeling the PKey values for specific subnet prefixes
provides the user maximum flexibility, as PKey values may be determined
independently for different subnets. There is a single access vector for PKeys
called "access".

An Infiniband port is labeled by device name and port number. There is a single
access vector for IB ports called "manage_subnet".

Because RDMA allows kernel bypass, enforcement must be done during connection
setup. Communication over RDMA requires a send and receive queue, collectively
known as a Queue Pair (QP). A QP must be initialized by privileged system calls
before it can be used to send or receive data. During initialization the user
must provide the PKey and port the QP will use; at this time access control can
be enforced.

Because there is a possibility that the enforcement settings or security
policy can change, a means of notifying the ib_core module of such changes
is required. To facilitate this a generic notification callback mechanism
is added to the LSM. One callback is registered for checking the QP PKey
associations when the policy changes. Mad agents also register a callback,
they cache the permission to send and receive SMPs to avoid another per
packet call to the LSM.

Because frequent accesses to the same PKey's SID is expected a cache is
implemented which is very similar to the netport cache.

In order to properly enforce security when changes to the PKey table or
security policy or enforcement occur ib_core must track which QPs are
using which port, pkey index, and alternate path for every IB device.
This makes operations that used to be atomic transactional.

When modifying a QP, ib_core must associate it with the PKey index, port,
and alternate path specified. If the QP was already associated with
different settings, the QP is added to the new list prior to the
modification. If the modify succeeds then the old listing is removed. If
the modify fails the new listing is removed and the old listing remains
unchanged.

When destroying a QP the ib_qp structure is freed by the decive specific
driver (i.e. mlx4_ib) if the 'destroy' is successful. This requires storing
security related information in a separate structure. When a 'destroy'
request is in process the ib_qp structure is in an undefined state so if
there are changes to the security policy or PKey table, the security checks
cannot reset the QP if it doesn't have permission for the new setting. If
the 'destroy' fails, security for that QP must be enforced again and its
status in the list is restored. If the 'destroy' succeeds the security info
can be cleaned up and freed.

There are a number of locks required to protect the QP security structure
and the QP to device/port/pkey index lists. If multiple locks are required,
the safe locking order is: QP security structure mutex first, followed by
any list locks needed, which are sorted first by port followed by pkey
index.

---
v2:
- Use void* blobs in the LSM hooks. Paul Moore
- Make the policy change callback generic. Yuval Shaia, Paul Moore
- Squash LSM changes into the patches where the calls are added. Paul Moore
- Don't add new initial SIDs. Stephen Smalley
- Squash MAD agent PKey and SMI patches and move logic to IB security. Dan Jurgens
- Changed ib_end_port to ib_port. Paul Moore
- Changed ib_port access vector from smp to manage_subnet. Paul Moore
- Added pkey and ib_port details to the audit log. Paul Moore
- See individual patches for more detail.

v3:
- ib_port -> ib_endport. Paul Moore
- use notifier chains for LSM notifications. Paul Moore
- reorder parameters in hooks to put security blob first. Paul Moore
- Don't treat device name as untrusted string in audit log. Paul Moore

v4:
- Added separate AVC callback for LSM notifier. Paul Moore
- Removed unneeded braces in ocontext_read. Paul Moore

v5:
- Fix link error when CONFIG_SECURITY is not set. Build Robot
- Strip issue and Gerrit-Id: Leon Romanovsky

Daniel Jurgens (9):
  IB/core: IB cache enhancements to support Infiniband security
  IB/core: Enforce PKey security on QPs
  selinux lsm IB/core: Implement LSM notification system
  IB/core: Enforce security on management datagrams
  selinux: Create policydb version for Infiniband support
  selinux: Allocate and free infiniband security hooks
  selinux: Implement Infiniband PKey "Access" access vector
  selinux: Add IB Port SMP access vector
  selinux: Add a cache for quicker retreival of PKey SIDs

 drivers/infiniband/core/Makefile     |   3 +-
 drivers/infiniband/core/cache.c      |  57 ++-
 drivers/infiniband/core/core_priv.h  | 115 ++++++
 drivers/infiniband/core/device.c     |  86 +++++
 drivers/infiniband/core/mad.c        |  52 ++-
 drivers/infiniband/core/security.c   | 709 +++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_cmd.c |  20 +-
 drivers/infiniband/core/verbs.c      |  27 +-
 include/linux/lsm_audit.h            |  15 +
 include/linux/lsm_hooks.h            |  35 ++
 include/linux/security.h             |  50 +++
 include/rdma/ib_mad.h                |   4 +
 include/rdma/ib_verbs.h              |  49 +++
 security/Kconfig                     |   9 +
 security/lsm_audit.c                 |  18 +
 security/security.c                  |  59 +++
 security/selinux/Makefile            |   2 +-
 security/selinux/hooks.c             |  87 ++++-
 security/selinux/ibpkey.c            | 245 ++++++++++++
 security/selinux/include/classmap.h  |   4 +
 security/selinux/include/ibpkey.h    |  31 ++
 security/selinux/include/objsec.h    |  11 +
 security/selinux/include/security.h  |   7 +-
 security/selinux/selinuxfs.c         |   2 +
 security/selinux/ss/policydb.c       | 129 ++++++-
 security/selinux/ss/policydb.h       |  27 +-
 security/selinux/ss/services.c       |  83 ++++
 27 files changed, 1891 insertions(+), 45 deletions(-)
 create mode 100644 drivers/infiniband/core/security.c
 create mode 100644 security/selinux/ibpkey.c
 create mode 100644 security/selinux/include/ibpkey.h

-- 
2.7.4


^ permalink raw reply

* [PATCH v2 11/11] IB/srpt: Increase lid and sm_lid to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

srpt contains lid and sm_lid fields which are 16 bits in
length, increase them to 32 bits.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 4 ++--
 drivers/infiniband/ulp/srpt/ib_srpt.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 4dc66ba..0b1f69e 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -514,8 +514,8 @@ static int srpt_refresh_port(struct srpt_port *sport)
 	if (ret)
 		goto err_query_port;
 
-	sport->sm_lid = (u16)port_attr.sm_lid;
-	sport->lid = (u16)port_attr.lid;
+	sport->sm_lid = port_attr.sm_lid;
+	sport->lid = port_attr.lid;
 
 	ret = ib_query_gid(sport->sdev->device, sport->port, 0, &sport->gid,
 			   NULL);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index 5818787..580bb75 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -322,8 +322,8 @@ struct srpt_port {
 	bool			enabled;
 	u8			port_guid[64];
 	u8			port;
-	u16			sm_lid;
-	u16			lid;
+	u32			sm_lid;
+	u32			lid;
 	union ib_gid		gid;
 	struct work_struct	work;
 	struct se_portal_group	port_tpg_1;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 10/11] IB/IPoIB: Modify ipoib_get_net_dev_by_params to lookup gid table
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

ipoib_get_net_dev_by_params compares incoming gid with local_gid
which is gid at index 0 of the gid table. OPA devices using larger
LIDs may have a different GID format than whats setup in the local_gid
field. Do a search of the gid table in those cases.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 37 +++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 474c3bf..3e135df 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -328,6 +328,40 @@ static struct net_device *ipoib_get_net_dev_match_addr(
 	return result;
 }
 
+/* retuns true if the incoming gid is assigned to the IPoIB
+ * netdev interface
+ *
+ * OPA devices may have the incoming GID in the OPA GID
+ * format which might not necessarily be assigned to the
+ * netdev interface. This necessitates searching the GID
+ * table to match this OPA GID.
+ */
+static bool ipoib_check_gid(struct ipoib_dev_priv *priv,
+			    const union ib_gid *gid)
+{
+	bool is_local_gid;
+	struct ib_port_attr attr;
+	union ib_gid port_gid;
+	int i;
+
+	if (!gid)
+		return true;
+
+	is_local_gid = !memcmp(gid, &priv->local_gid, sizeof(*gid));
+
+	if (!rdma_cap_opa_ah(priv->ca, priv->port) || is_local_gid)
+		return is_local_gid;
+
+	if (ib_query_port(priv->ca, priv->port, &attr))
+		return false;
+	for (i = 1; i < attr.gid_tbl_len; i++) {
+		if (ib_query_gid(priv->ca, priv->port, i, &port_gid, NULL))
+			return false;
+		if (!memcmp(gid, &port_gid, sizeof(*gid)))
+			return true;
+	}
+	return false;
+}
 /* returns the number of IPoIB netdevs on top a given ipoib device matching a
  * pkey_index and address, if one exists.
  *
@@ -344,8 +378,7 @@ static int ipoib_match_gid_pkey_addr(struct ipoib_dev_priv *priv,
 	struct net_device *net_dev = NULL;
 	int matches = 0;
 
-	if (priv->pkey_index == pkey_index &&
-	    (!gid || !memcmp(gid, &priv->local_gid, sizeof(*gid)))) {
+	if (priv->pkey_index == pkey_index && ipoib_check_gid(priv, gid)) {
 		if (!addr) {
 			net_dev = ipoib_get_master_net_dev(priv->dev);
 		} else {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 09/11] IB/IPoIB: Retrieve 32 bit LIDs from path records when running on OPA devices
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Path record responses will contain the 32 bit LID information in the
SGID and DGID field of the responses. Modify IPoIB to use these extended
LIDs in datagram and connected mode communication.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |  4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        | 11 +++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c      | 26 ++++++++++++++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  2 +-
 include/rdma/opa_addr.h                        | 12 ++++++++++++
 5 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7b8d2d9..fad4560 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -352,7 +352,7 @@ struct ipoib_dev_priv {
 	u32		  qkey;
 
 	union ib_gid local_gid;
-	u16	     local_lid;
+	u32	     local_lid;
 
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
@@ -421,6 +421,8 @@ struct ipoib_path {
 	struct rb_node	      rb_node;
 	struct list_head      list;
 	int  		      valid;
+	u32		      dlid;
+	u32		      slid;
 };
 
 struct ipoib_neigh {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d..c27df76 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/moduleparam.h>
+#include <rdma/opa_addr.h>
 
 #include "ipoib.h"
 
@@ -1356,6 +1357,16 @@ static void ipoib_cm_tx_start(struct work_struct *work)
 		}
 		memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);
 
+		if (rdma_cap_opa_ah(priv->ca, priv->port)) {
+			if (p->path->dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE))
+				pathrec.dgid.global.interface_id =
+					OPA_MAKE_ID(p->path->dlid);
+
+			if (p->path->slid >= be16_to_cpu(IB_MULTICAST_LID_BASE))
+				pathrec.sgid.global.interface_id =
+					OPA_MAKE_ID(p->path->slid);
+		}
+
 		spin_unlock_irqrestore(&priv->lock, flags);
 		netif_tx_unlock_bh(dev);
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 5636fc3..474c3bf 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -52,6 +52,7 @@
 #include <linux/inetdevice.h>
 #include <rdma/ib_cache.h>
 #include <linux/pci.h>
+#include <rdma/opa_addr.h>
 
 #define DRV_VERSION "1.0.0"
 
@@ -766,6 +767,31 @@ static void path_rec_completion(int status,
 	spin_lock_irqsave(&priv->lock, flags);
 
 	if (!IS_ERR_OR_NULL(ah)) {
+		/*
+		 * Extended LIDs might get programmed into GIDs in the
+		 * case of OPA devices. Since we have created the ah
+		 * above which would have made use of the lids, now is
+		 * a good time to change them back to regular GIDs after
+		 * saving the extended LIDs.
+		 */
+		if (rdma_cap_opa_ah(priv->ca, priv->port) &&
+		    ib_is_opa_gid(&pathrec->sgid)) {
+			path->slid = opa_get_lid_from_gid(&pathrec->sgid);
+			pathrec->sgid = path->pathrec.sgid;
+		} else {
+			path->slid = be16_to_cpu(pathrec->slid);
+		}
+
+		if (rdma_cap_opa_ah(priv->ca, priv->port) &&
+		    ib_is_opa_gid(&pathrec->dgid)) {
+			path->dlid = opa_get_lid_from_gid(&pathrec->dgid);
+			pathrec->dgid = path->pathrec.dgid;
+		} else {
+			path->dlid = be16_to_cpu(pathrec->dlid);
+		}
+		ipoib_dbg(priv, "PathRec SGID %pI6 DGID %pI6\n",
+			  pathrec->sgid.raw, pathrec->dgid.raw);
+
 		path->pathrec = *pathrec;
 
 		old_ah   = path->ah;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index bff73b5..d3394b6 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -581,7 +581,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
 			  port_attr.state);
 		return;
 	}
-	priv->local_lid = (u16)port_attr.lid;
+	priv->local_lid = port_attr.lid;
 	netif_addr_lock_bh(dev);
 
 	if (!test_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags)) {
diff --git a/include/rdma/opa_addr.h b/include/rdma/opa_addr.h
index 5c713bc..d0a37d0 100644
--- a/include/rdma/opa_addr.h
+++ b/include/rdma/opa_addr.h
@@ -53,4 +53,16 @@ static inline bool ib_is_opa_gid(union ib_gid *gid)
 	return ((be64_to_cpu(gid->global.interface_id) >> 40) ==
 		OPA_SPECIAL_OUI);
 }
+
+/**
+ * opa_get_lid_from_gid: Returns the last 32 bits of the gid.
+ * OPA devices use one of the gids in the gid table to also
+ * store the lid.
+ *
+ * @gid: The Global identifier
+ */
+static inline u32 opa_get_lid_from_gid(union ib_gid *gid)
+{
+	return be64_to_cpu(gid->global.interface_id) & 0xFFFFFFFF;
+}
 #endif /* OPA_ADDR_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 08/11] IB/SA: Program extended LID in SM Address handle
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Reuse port_attr.grh_required field to let IB SA know that extended LID
information needs to be set in the SM Address handle for OPA devices

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c | 8 +++++++-
 include/rdma/opa_addr.h            | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 81b742c..d59ee67 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -50,6 +50,7 @@
 #include <uapi/rdma/ib_user_sa.h>
 #include <rdma/ib_marshall.h>
 #include <rdma/ib_addr.h>
+#include <rdma/opa_addr.h>
 #include "sa.h"
 #include "core_priv.h"
 
@@ -964,7 +965,12 @@ static void update_sm_ah(struct work_struct *work)
 	if (port_attr.grh_required) {
 		ah_attr.ah_flags = IB_AH_GRH;
 		ah_attr.grh.dgid.global.subnet_prefix = cpu_to_be64(port_attr.subnet_prefix);
-		ah_attr.grh.dgid.global.interface_id = cpu_to_be64(IB_SA_WELL_KNOWN_GUID);
+		if (rdma_cap_opa_ah(port->agent->device, port->port_num))
+			ah_attr.grh.dgid.global.interface_id =
+				OPA_MAKE_ID(ah_attr.dlid);
+		else
+			ah_attr.grh.dgid.global.interface_id =
+				cpu_to_be64(IB_SA_WELL_KNOWN_GUID);
 	}
 
 	new_ah->ah = ib_create_ah(port->agent->qp->pd, &ah_attr);
diff --git a/include/rdma/opa_addr.h b/include/rdma/opa_addr.h
index 3e22937..5c713bc 100644
--- a/include/rdma/opa_addr.h
+++ b/include/rdma/opa_addr.h
@@ -38,6 +38,7 @@
 #define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
 				 ? 0 : x)
 #define	OPA_SPECIAL_OUI		(0x00066AULL)
+#define OPA_MAKE_ID(x)          (cpu_to_be64(OPA_SPECIAL_OUI << 40 | (x)))
 
 /**
  * ib_is_opa_gid: Returns true if the top 24 bits of the gid
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 07/11] IB/mad: Change slid in RMPP recv from 16 to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

OPA devices will contain larger lids in the wc.slid
which is now 32 bits. This change ensures RMPP handler
is able to retrieve the correct lid.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad_rmpp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 8a076e1..057c1ec 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -64,7 +64,7 @@ struct mad_rmpp_recv {
 
 	__be64 tid;
 	u32 src_qp;
-	u16 slid;
+	u32 slid;
 	u8 mgmt_class;
 	u8 class_version;
 	u8 method;
@@ -316,7 +316,7 @@ static void recv_cleanup_handler(struct work_struct *work)
 	mad_hdr = &mad_recv_wc->recv_buf.mad->mad_hdr;
 	rmpp_recv->tid = mad_hdr->tid;
 	rmpp_recv->src_qp = mad_recv_wc->wc->src_qp;
-	rmpp_recv->slid = (u16)mad_recv_wc->wc->slid;
+	rmpp_recv->slid = mad_recv_wc->wc->slid;
 	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
 	rmpp_recv->class_version = mad_hdr->class_version;
 	rmpp_recv->method  = mad_hdr->method;
@@ -337,7 +337,7 @@ static void recv_cleanup_handler(struct work_struct *work)
 	list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) {
 		if (rmpp_recv->tid == mad_hdr->tid &&
 		    rmpp_recv->src_qp == mad_recv_wc->wc->src_qp &&
-		    rmpp_recv->slid == (u16)mad_recv_wc->wc->slid &&
+		    rmpp_recv->slid == mad_recv_wc->wc->slid &&
 		    rmpp_recv->mgmt_class == mad_hdr->mgmt_class &&
 		    rmpp_recv->class_version == mad_hdr->class_version &&
 		    rmpp_recv->method == mad_hdr->method)
@@ -870,7 +870,7 @@ static int init_newwin(struct ib_mad_send_wr_private *mad_send_wr)
 		if (ib_query_ah(mad_send_wr->send_buf.ah, &ah_attr))
 			continue;
 
-		if (rmpp_recv->slid == (u16)ah_attr.dlid) {
+		if (rmpp_recv->slid == ah_attr.dlid) {
 			newwin = rmpp_recv->repwin;
 			break;
 		}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 06/11] IB/mad: Ensure DR MADs are correctly specified when using OPA devices
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

From: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Pure DR MADs do not need OPA GIDs to be specified in the GRH since
they do not rely on LID information.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad.c | 104 +++++++++++++++++++++++++++++++++++++-----
 include/rdma/opa_addr.h       |  17 +++++++
 2 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 40cbd6b..c0ee997 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -41,6 +41,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <rdma/ib_cache.h>
+#include <rdma/opa_addr.h>
 
 #include "mad_priv.h"
 #include "mad_rmpp.h"
@@ -731,6 +732,80 @@ static size_t mad_priv_dma_size(const struct ib_mad_private *mp)
 	return sizeof(struct ib_grh) + mp->mad_size;
 }
 
+static int verify_mad_ah(struct ib_mad_agent_private *mad_agent_priv,
+			 struct ib_mad_send_wr_private *mad_send_wr)
+{
+	struct ib_device *ib_dev = mad_agent_priv->qp_info->port_priv->device;
+	u8 port = mad_agent_priv->qp_info->port_priv->port_num;
+	struct ib_smp *smp = mad_send_wr->send_buf.mad;
+	struct opa_smp *opa_smp = (struct opa_smp *)smp;
+	u32 opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
+	u32 opa_drdlid = be32_to_cpu(opa_smp->route.dr.dr_dlid);
+
+	bool dr_slid_is_permissive = (OPA_LID_PERMISSIVE ==
+				      opa_smp->route.dr.dr_slid) ? true : false;
+	bool dr_dlid_is_permissive = (OPA_LID_PERMISSIVE ==
+				      opa_smp->route.dr.dr_dlid) ? true : false;
+	bool drslid_is_ib_ucast = (opa_drslid <
+				   be16_to_cpu(IB_MULTICAST_LID_BASE)) ?
+					true : false;
+	bool drdlid_is_ib_ucast = (opa_drdlid <
+				   be16_to_cpu(IB_MULTICAST_LID_BASE)) ?
+					true : false;
+	bool drslid_is_ext = !drslid_is_ib_ucast && !dr_slid_is_permissive;
+	bool drdlid_is_ext = !drdlid_is_ib_ucast && !dr_dlid_is_permissive;
+	bool grh_present = false;
+	struct ib_ah_attr attr;
+	union ib_gid sgid;
+	int ret = 0;
+
+	ret = ib_query_ah(mad_send_wr->send_buf.ah, &attr);
+	if (ret)
+		return ret;
+	grh_present = (attr.ah_flags & IB_AH_GRH);
+	if (grh_present) {
+		ret = ib_query_gid(ib_dev, port, attr.grh.sgid_index,
+				   &sgid, NULL);
+		if (ret)
+			return ret;
+	}
+
+	if (smp->class_version == OPA_SMP_CLASS_VERSION) {
+		/*
+		 * Conditions when GRH info should not be specified
+		 * 1. both dr_slid and dr_dlid are permissve (Pure DR)
+		 * 2. both dr_slid and dr_dlid are less than 0xc000.
+		 *
+		 * Conditions when GRH info should be specified
+		 * 1. dr_dlid is not permissive and above 0xbfff
+		 * OR
+		 * 2. dr_slid is not permissive and above 0xbfff
+		 */
+		if (grh_present) {
+			if ((dr_slid_is_permissive &&
+			     dr_dlid_is_permissive) ||
+			     (drslid_is_ib_ucast && drdlid_is_ib_ucast))
+				if (ib_is_opa_gid(&attr.grh.dgid) &&
+				    ib_is_opa_gid(&sgid))
+					return -EINVAL;
+			if (drslid_is_ext && !ib_is_opa_gid(&sgid))
+				return -EINVAL;
+			if (drdlid_is_ext &&
+			    !ib_is_opa_gid(&attr.grh.dgid))
+				return -EINVAL;
+		} else { /* There is no GRH */
+			if (drslid_is_ext || drdlid_is_ext)
+				return -EINVAL;
+		}
+	} else {
+		if (grh_present)
+			if (ib_is_opa_gid(&attr.grh.dgid) &&
+			    ib_is_opa_gid(&sgid))
+				return -EINVAL;
+	}
+	return ret;
+}
+
 /*
  * Return 0 if SMP is to be sent
  * Return 1 if SMP was consumed locally (whether or not solicited)
@@ -754,8 +829,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	size_t mad_size = port_mad_size(mad_agent_priv->qp_info->port_priv);
 	u16 out_mad_pkey_index = 0;
 	u16 drslid;
-	bool opa = rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
-				    mad_agent_priv->qp_info->port_priv->port_num);
+	bool opa_mad =
+		rdma_cap_opa_mad(mad_agent_priv->qp_info->port_priv->device,
+				 mad_agent_priv->qp_info->port_priv->port_num);
+	bool opa_ah =
+		rdma_cap_opa_ah(mad_agent_priv->qp_info->port_priv->device,
+				mad_agent_priv->qp_info->port_priv->port_num);
 
 	if (rdma_cap_ib_switch(device) &&
 	    smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
@@ -763,13 +842,21 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	else
 		port_num = mad_agent_priv->agent.port_num;
 
+	if (opa_mad && opa_ah) {
+		ret = verify_mad_ah(mad_agent_priv, mad_send_wr);
+		if (ret) {
+			dev_err(&device->dev,
+				"Error verifying MAD format\n");
+			goto out;
+		}
+	}
 	/*
 	 * Directed route handling starts if the initial LID routed part of
 	 * a request or the ending LID routed part of a response is empty.
 	 * If we are at the start of the LID routed part, don't update the
 	 * hop_ptr or hop_cnt.  See section 14.2.2, Vol 1 IB spec.
 	 */
-	if (opa && smp->class_version == OPA_SMP_CLASS_VERSION) {
+	if (opa_mad && smp->class_version == OPA_SMP_CLASS_VERSION) {
 		u32 opa_drslid;
 
 		if ((opa_get_smp_direction(opa_smp)
@@ -783,13 +870,6 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 			goto out;
 		}
 		opa_drslid = be32_to_cpu(opa_smp->route.dr.dr_slid);
-		if (opa_drslid != be32_to_cpu(OPA_LID_PERMISSIVE) &&
-		    opa_drslid & 0xffff0000) {
-			ret = -EINVAL;
-			dev_err(&device->dev, "OPA Invalid dr_slid 0x%x\n",
-			       opa_drslid);
-			goto out;
-		}
 		drslid = (u16)(opa_drslid & 0x0000ffff);
 
 		/* Check to post send on QP or process locally */
@@ -834,7 +914,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 		     send_wr->pkey_index,
 		     send_wr->port_num, &mad_wc);
 
-	if (opa && smp->base_version == OPA_MGMT_BASE_VERSION) {
+	if (opa_mad && smp->base_version == OPA_MGMT_BASE_VERSION) {
 		mad_wc.byte_len = mad_send_wr->send_buf.hdr_len
 					+ mad_send_wr->send_buf.data_len
 					+ sizeof(struct ib_grh);
@@ -891,7 +971,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
 	}
 
 	local->mad_send_wr = mad_send_wr;
-	if (opa) {
+	if (opa_mad) {
 		local->mad_send_wr->send_wr.pkey_index = out_mad_pkey_index;
 		local->return_wc_byte_len = mad_size;
 	}
diff --git a/include/rdma/opa_addr.h b/include/rdma/opa_addr.h
index 142b327..3e22937 100644
--- a/include/rdma/opa_addr.h
+++ b/include/rdma/opa_addr.h
@@ -33,6 +33,23 @@
 #if !defined(OPA_ADDR_H)
 #define OPA_ADDR_H
 
+#include <rdma/ib_verbs.h>
+
 #define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
 				 ? 0 : x)
+#define	OPA_SPECIAL_OUI		(0x00066AULL)
+
+/**
+ * ib_is_opa_gid: Returns true if the top 24 bits of the gid
+ * contains the OPA_STL_OUI identifier. This identifies that
+ * the provided gid is a special purpose GID meant to carry
+ * extended LID information.
+ *
+ * @gid: The Global identifier
+ */
+static inline bool ib_is_opa_gid(union ib_gid *gid)
+{
+	return ((be64_to_cpu(gid->global.interface_id) >> 40) ==
+		OPA_SPECIAL_OUI);
+}
 #endif /* OPA_ADDR_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 05/11] IB/core: Change wc.slid from 16 to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

From: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

lid field in ib_wc is increased to 32 bits. This enables core
components to use the larger addresses if needed.
The user ABI is unchanged and return 16 bit values when queried.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/cm.c            | 4 ++--
 drivers/infiniband/core/mad_rmpp.c      | 4 ++--
 drivers/infiniband/core/user_mad.c      | 2 +-
 drivers/infiniband/core/uverbs_cmd.c    | 2 +-
 drivers/infiniband/hw/hfi1/mad.c        | 2 +-
 drivers/infiniband/hw/hfi1/rc.c         | 2 +-
 drivers/infiniband/hw/hfi1/ruc.c        | 2 +-
 drivers/infiniband/hw/hfi1/uc.c         | 2 +-
 drivers/infiniband/hw/mlx4/mad.c        | 6 +++---
 drivers/infiniband/hw/mlx4/mcg.c        | 2 +-
 drivers/infiniband/hw/mlx5/mad.c        | 2 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c | 4 ++--
 drivers/infiniband/hw/mthca/mthca_mad.c | 2 +-
 drivers/infiniband/hw/qib/qib_rc.c      | 2 +-
 drivers/infiniband/hw/qib/qib_ruc.c     | 2 +-
 drivers/infiniband/hw/qib/qib_uc.c      | 2 +-
 drivers/infiniband/sw/rdmavt/cq.c       | 2 +-
 include/rdma/ib_verbs.h                 | 2 +-
 18 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c995255..137c4c2 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1576,7 +1576,7 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
 {
 	if (!cm_req_get_primary_subnet_local(req_msg)) {
 		if (req_msg->primary_local_lid == IB_LID_PERMISSIVE) {
-			req_msg->primary_local_lid = cpu_to_be16(wc->slid);
+			req_msg->primary_local_lid = cpu_to_be16((u16)wc->slid);
 			cm_req_set_primary_sl(req_msg, wc->sl);
 		}
 
@@ -1586,7 +1586,7 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
 
 	if (!cm_req_get_alt_subnet_local(req_msg)) {
 		if (req_msg->alt_local_lid == IB_LID_PERMISSIVE) {
-			req_msg->alt_local_lid = cpu_to_be16(wc->slid);
+			req_msg->alt_local_lid = cpu_to_be16((u16)wc->slid);
 			cm_req_set_alt_sl(req_msg, wc->sl);
 		}
 
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index c34dca3..8a076e1 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -316,7 +316,7 @@ static void recv_cleanup_handler(struct work_struct *work)
 	mad_hdr = &mad_recv_wc->recv_buf.mad->mad_hdr;
 	rmpp_recv->tid = mad_hdr->tid;
 	rmpp_recv->src_qp = mad_recv_wc->wc->src_qp;
-	rmpp_recv->slid = mad_recv_wc->wc->slid;
+	rmpp_recv->slid = (u16)mad_recv_wc->wc->slid;
 	rmpp_recv->mgmt_class = mad_hdr->mgmt_class;
 	rmpp_recv->class_version = mad_hdr->class_version;
 	rmpp_recv->method  = mad_hdr->method;
@@ -337,7 +337,7 @@ static void recv_cleanup_handler(struct work_struct *work)
 	list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) {
 		if (rmpp_recv->tid == mad_hdr->tid &&
 		    rmpp_recv->src_qp == mad_recv_wc->wc->src_qp &&
-		    rmpp_recv->slid == mad_recv_wc->wc->slid &&
+		    rmpp_recv->slid == (u16)mad_recv_wc->wc->slid &&
 		    rmpp_recv->mgmt_class == mad_hdr->mgmt_class &&
 		    rmpp_recv->class_version == mad_hdr->class_version &&
 		    rmpp_recv->method == mad_hdr->method)
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..2a0b928 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -229,7 +229,7 @@ static void recv_handler(struct ib_mad_agent *agent,
 	packet->mad.hdr.status	   = 0;
 	packet->mad.hdr.length	   = hdr_size(file) + mad_recv_wc->mad_len;
 	packet->mad.hdr.qpn	   = cpu_to_be32(mad_recv_wc->wc->src_qp);
-	packet->mad.hdr.lid	   = cpu_to_be16(mad_recv_wc->wc->slid);
+	packet->mad.hdr.lid	   = cpu_to_be16((u16)mad_recv_wc->wc->slid);
 	packet->mad.hdr.sl	   = mad_recv_wc->wc->sl;
 	packet->mad.hdr.path_bits  = mad_recv_wc->wc->dlid_path_bits;
 	packet->mad.hdr.pkey_index = mad_recv_wc->wc->pkey_index;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e135c08..a203cf2 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1619,7 +1619,7 @@ static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
 	tmp.src_qp		= wc->src_qp;
 	tmp.wc_flags		= wc->wc_flags;
 	tmp.pkey_index		= wc->pkey_index;
-	tmp.slid		= wc->slid;
+	tmp.slid		= (u16)wc->slid;
 	tmp.sl			= wc->sl;
 	tmp.dlid_path_bits	= wc->dlid_path_bits;
 	tmp.port_num		= wc->port_num;
diff --git a/drivers/infiniband/hw/hfi1/mad.c b/drivers/infiniband/hw/hfi1/mad.c
index 9487c9b..c76e546 100644
--- a/drivers/infiniband/hw/hfi1/mad.c
+++ b/drivers/infiniband/hw/hfi1/mad.c
@@ -3974,7 +3974,7 @@ static int opa_local_smp_check(struct hfi1_ibport *ibp,
 			       const struct ib_wc *in_wc)
 {
 	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
-	u16 slid = in_wc->slid;
+	u16 slid = (u16)in_wc->slid;
 	u16 pkey;
 
 	if (in_wc->pkey_index >= ARRAY_SIZE(ppd->pkeys))
diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index caca6f5..40ae502 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -2306,7 +2306,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet)
 			wc.opcode = IB_WC_RECV;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = (u16)qp->remote_ah_attr.dlid;
+		wc.slid = qp->remote_ah_attr.dlid;
 		/*
 		 * It seems that IB mandates the presence of an SL in a
 		 * work completion only for the UD transport (see section
diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c
index 2fe2b2f..777cfc8 100644
--- a/drivers/infiniband/hw/hfi1/ruc.c
+++ b/drivers/infiniband/hw/hfi1/ruc.c
@@ -591,7 +591,7 @@ static void ruc_loopback(struct rvt_qp *sqp)
 	wc.byte_len = wqe->length;
 	wc.qp = &qp->ibqp;
 	wc.src_qp = qp->remote_qpn;
-	wc.slid = (u16)qp->remote_ah_attr.dlid;
+	wc.slid = qp->remote_ah_attr.dlid;
 	wc.sl = qp->remote_ah_attr.sl;
 	wc.port_num = 1;
 	/* Signal completion event if the solicited bit is set. */
diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c
index 0572dc7..5e6d1ba 100644
--- a/drivers/infiniband/hw/hfi1/uc.c
+++ b/drivers/infiniband/hw/hfi1/uc.c
@@ -451,7 +451,7 @@ void hfi1_uc_rcv(struct hfi1_packet *packet)
 		wc.status = IB_WC_SUCCESS;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = (u16)qp->remote_ah_attr.dlid;
+		wc.slid = qp->remote_ah_attr.dlid;
 		/*
 		 * It seems that IB mandates the presence of an SL in a
 		 * work completion only for the UD transport (see section
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 404ec4e..8e50de0 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -167,7 +167,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int mad_ifc_flags,
 
 		op_modifier |= 0x4;
 
-		in_modifier |= in_wc->slid << 16;
+		in_modifier |= (u16)in_wc->slid << 16;
 	}
 
 	err = mlx4_cmd_box(dev->dev, inmailbox->dma, outmailbox->dma, in_modifier,
@@ -599,7 +599,7 @@ int mlx4_ib_send_to_slave(struct mlx4_ib_dev *dev, int slave, u8 port,
 		memcpy((char *)&tun_mad->hdr.slid_mac_47_32, &(wc->smac[4]), 2);
 	} else {
 		tun_mad->hdr.sl_vid = cpu_to_be16(((u16)(wc->sl)) << 12);
-		tun_mad->hdr.slid_mac_47_32 = cpu_to_be16(wc->slid);
+		tun_mad->hdr.slid_mac_47_32 = cpu_to_be16((u16)wc->slid);
 	}
 
 	ib_dma_sync_single_for_device(&dev->ib_dev,
@@ -787,7 +787,7 @@ static int ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 		}
 	}
 
-	slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
+	slid = in_wc ? (u16)in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 
 	if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && slid == 0) {
 		forward_trap(to_mdev(ibdev), port_num, in_mad);
diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
index d46a847..a21d37f 100644
--- a/drivers/infiniband/hw/mlx4/mcg.c
+++ b/drivers/infiniband/hw/mlx4/mcg.c
@@ -244,7 +244,7 @@ static int send_mad_to_slave(int slave, struct mlx4_ib_demux_ctx *ctx,
 	wc.sl = 0;
 	wc.dlid_path_bits = 0;
 	wc.port_num = ctx->port;
-	wc.slid = (u16)ah_attr.dlid;  /* opensm lid */
+	wc.slid = ah_attr.dlid;  /* opensm lid */
 	wc.src_qp = 1;
 	return mlx4_ib_send_to_slave(dev, slave, ctx->port, IB_QPT_GSI, &wc, NULL, mad);
 }
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 39e5848..f0323b7 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -66,7 +66,7 @@ static int process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 	u16 slid;
 	int err;
 
-	slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
+	slid = in_wc ? (u16)in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 
 	if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && slid == 0)
 		return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED;
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..d07f389 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1913,7 +1913,7 @@ int mthca_MAD_IFC(struct mthca_dev *dev, int ignore_mkey, int ignore_bkey,
 			(in_wc->wc_flags & IB_WC_GRH ? 0x80 : 0);
 		MTHCA_PUT(inbox, val,               MAD_IFC_G_PATH_OFFSET);
 
-		MTHCA_PUT(inbox, in_wc->slid,       MAD_IFC_RLID_OFFSET);
+		MTHCA_PUT(inbox, (u16)in_wc->slid,       MAD_IFC_RLID_OFFSET);
 		MTHCA_PUT(inbox, in_wc->pkey_index, MAD_IFC_PKEY_OFFSET);
 
 		if (in_grh)
@@ -1921,7 +1921,7 @@ int mthca_MAD_IFC(struct mthca_dev *dev, int ignore_mkey, int ignore_bkey,
 
 		op_modifier |= 0x4;
 
-		in_modifier |= in_wc->slid << 16;
+		in_modifier |= (u16)in_wc->slid << 16;
 	}
 
 	err = mthca_cmd_box(dev, inmailbox->dma, outmailbox->dma,
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index b503d160..e9a7dd0 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -204,7 +204,7 @@ int mthca_process_mad(struct ib_device *ibdev,
 		      u16 *out_mad_pkey_index)
 {
 	int err;
-	u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
+	u16 slid = in_wc ? (u16)in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE);
 	u16 prev_lid = 0;
 	struct ib_port_attr pattr;
 	const struct ib_mad *in_mad = (const struct ib_mad *)in;
diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c
index 91f0d08..9d0b2bc 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -2018,7 +2018,7 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct ib_header *hdr,
 			wc.opcode = IB_WC_RECV;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = (u16)qp->remote_ah_attr.dlid;
+		wc.slid = qp->remote_ah_attr.dlid;
 		wc.sl = qp->remote_ah_attr.sl;
 		/* zero fields that are N/A */
 		wc.vendor_err = 0;
diff --git a/drivers/infiniband/hw/qib/qib_ruc.c b/drivers/infiniband/hw/qib/qib_ruc.c
index 0b0620f..588b4ae 100644
--- a/drivers/infiniband/hw/qib/qib_ruc.c
+++ b/drivers/infiniband/hw/qib/qib_ruc.c
@@ -566,7 +566,7 @@ static void qib_ruc_loopback(struct rvt_qp *sqp)
 	wc.byte_len = wqe->length;
 	wc.qp = &qp->ibqp;
 	wc.src_qp = qp->remote_qpn;
-	wc.slid = (u16)qp->remote_ah_attr.dlid;
+	wc.slid = qp->remote_ah_attr.dlid;
 	wc.sl = qp->remote_ah_attr.sl;
 	wc.port_num = 1;
 	/* Signal completion event if the solicited bit is set. */
diff --git a/drivers/infiniband/hw/qib/qib_uc.c b/drivers/infiniband/hw/qib/qib_uc.c
index 7a10748..5b2d483 100644
--- a/drivers/infiniband/hw/qib/qib_uc.c
+++ b/drivers/infiniband/hw/qib/qib_uc.c
@@ -403,7 +403,7 @@ void qib_uc_rcv(struct qib_ibport *ibp, struct ib_header *hdr,
 		wc.status = IB_WC_SUCCESS;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = (u16)qp->remote_ah_attr.dlid;
+		wc.slid = qp->remote_ah_attr.dlid;
 		wc.sl = qp->remote_ah_attr.sl;
 		/* zero fields that are N/A */
 		wc.vendor_err = 0;
diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index 6d9904a..a03b240 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -105,7 +105,7 @@ void rvt_cq_enter(struct rvt_cq *cq, struct ib_wc *entry, bool solicited)
 		wc->uqueue[head].src_qp = entry->src_qp;
 		wc->uqueue[head].wc_flags = entry->wc_flags;
 		wc->uqueue[head].pkey_index = entry->pkey_index;
-		wc->uqueue[head].slid = entry->slid;
+		wc->uqueue[head].slid = (u16)entry->slid;
 		wc->uqueue[head].sl = entry->sl;
 		wc->uqueue[head].dlid_path_bits = entry->dlid_path_bits;
 		wc->uqueue[head].port_num = entry->port_num;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3d80720..b3f9130 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -896,7 +896,7 @@ struct ib_wc {
 	u32			src_qp;
 	int			wc_flags;
 	u16			pkey_index;
-	u16			slid;
+	u32			slid;
 	u8			sl;
 	u8			dlid_path_bits;
 	u8			port_num;	/* valid only for DR SMPs on switches */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 04/11] IB/core: Change port_attr.lid size from 16 to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

lid field in port_attr is increased to 32 bits. This enables core
components to use the larger addresses if needed.
The user ABI is unchanged and return 16 bit values when queried.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c           | 8 +++++---
 drivers/infiniband/hw/mlx4/alias_GUID.c        | 2 +-
 drivers/infiniband/hw/mlx4/mad.c               | 2 +-
 drivers/infiniband/hw/mthca/mthca_mad.c        | 2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c          | 2 +-
 include/rdma/ib_verbs.h                        | 2 +-
 7 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 36f5bc1..e135c08 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -515,11 +515,13 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
 	resp.bad_pkey_cntr   = attr.bad_pkey_cntr;
 	resp.qkey_viol_cntr  = attr.qkey_viol_cntr;
 	resp.pkey_tbl_len    = attr.pkey_tbl_len;
-	resp.lid 	     = attr.lid;
-	if (rdma_cap_opa_ah(ib_dev, cmd.port_num))
+	if (rdma_cap_opa_ah(ib_dev, cmd.port_num)) {
 		resp.sm_lid  = OPA_TO_IB_UCAST_LID(attr.sm_lid);
-	else
+		resp.lid  = OPA_TO_IB_UCAST_LID(attr.lid);
+	} else {
 		resp.sm_lid  = (u16)attr.sm_lid;
+		resp.lid     = (u16)attr.lid;
+	}
 	resp.lmc 	     = attr.lmc;
 	resp.max_vl_num      = attr.max_vl_num;
 	resp.sm_sl 	     = attr.sm_sl;
diff --git a/drivers/infiniband/hw/mlx4/alias_GUID.c b/drivers/infiniband/hw/mlx4/alias_GUID.c
index 5e99390..7fa64e6 100644
--- a/drivers/infiniband/hw/mlx4/alias_GUID.c
+++ b/drivers/infiniband/hw/mlx4/alias_GUID.c
@@ -527,7 +527,7 @@ static int set_guid_rec(struct ib_device *ibdev,
 
 	memset(&guid_info_rec, 0, sizeof (struct ib_sa_guidinfo_rec));
 
-	guid_info_rec.lid = cpu_to_be16(attr.lid);
+	guid_info_rec.lid = cpu_to_be16((u16)attr.lid);
 	guid_info_rec.block_num = index;
 
 	memcpy(guid_info_rec.guid_info_list, rec_det->all_recs,
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1672907..404ec4e 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -821,7 +821,7 @@ static int ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 	    in_mad->mad_hdr.method == IB_MGMT_METHOD_SET &&
 	    in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO &&
 	    !ib_query_port(ibdev, port_num, &pattr))
-		prev_lid = pattr.lid;
+		prev_lid = (u16)pattr.lid;
 
 	err = mlx4_MAD_IFC(to_mdev(ibdev),
 			   (mad_flags & IB_MAD_IGNORE_MKEY ? MLX4_MAD_IFC_IGNORE_MKEY : 0) |
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index 9139405..b503d160 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -255,7 +255,7 @@ int mthca_process_mad(struct ib_device *ibdev,
 	    in_mad->mad_hdr.method == IB_MGMT_METHOD_SET &&
 	    in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO &&
 	    !ib_query_port(ibdev, port_num, &pattr))
-		prev_lid = pattr.lid;
+		prev_lid = (u16)pattr.lid;
 
 	err = mthca_MAD_IFC(to_mdev(ibdev),
 			    mad_flags & IB_MAD_IGNORE_MKEY,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index d3394b6..bff73b5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -581,7 +581,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
 			  port_attr.state);
 		return;
 	}
-	priv->local_lid = port_attr.lid;
+	priv->local_lid = (u16)port_attr.lid;
 	netif_addr_lock_bh(dev);
 
 	if (!test_bit(IPOIB_FLAG_DEV_ADDR_SET, &priv->flags)) {
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index c6d0c47..4dc66ba 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -515,7 +515,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
 		goto err_query_port;
 
 	sport->sm_lid = (u16)port_attr.sm_lid;
-	sport->lid = port_attr.lid;
+	sport->lid = (u16)port_attr.lid;
 
 	ret = ib_query_gid(sport->sdev->device, sport->port, 0, &sport->gid,
 			   NULL);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 294d3ed..3d80720 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -520,7 +520,7 @@ struct ib_port_attr {
 	u32			bad_pkey_cntr;
 	u32			qkey_viol_cntr;
 	u16			pkey_tbl_len;
-	u16			lid;
+	u32			lid;
 	u32			sm_lid;
 	u8			lmc;
 	u8			max_vl_num;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 03/11] IB/core: Change ah_attr.dlid from 16 to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

dlid field in ah_attr is increased to 32 bits. This
enables core components to use the larger addresses if needed.
The user ABI is unchanged and userspace applications can use
16 bit lids when creating and modifying address handles.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/mad_rmpp.c        |  2 +-
 drivers/infiniband/core/sa_query.c        |  2 +-
 drivers/infiniband/core/uverbs_cmd.c      | 11 +++++++++--
 drivers/infiniband/core/uverbs_marshall.c |  2 +-
 drivers/infiniband/hw/hfi1/driver.c       |  4 ++--
 drivers/infiniband/hw/hfi1/rc.c           |  4 ++--
 drivers/infiniband/hw/hfi1/ruc.c          | 21 +++++++++++----------
 drivers/infiniband/hw/hfi1/uc.c           |  2 +-
 drivers/infiniband/hw/hfi1/ud.c           | 10 +++++-----
 drivers/infiniband/hw/hfi1/verbs.c        |  4 ++--
 drivers/infiniband/hw/mlx4/ah.c           |  2 +-
 drivers/infiniband/hw/mlx4/mcg.c          |  2 +-
 drivers/infiniband/hw/mlx4/qp.c           |  2 +-
 drivers/infiniband/hw/mlx5/ah.c           |  2 +-
 drivers/infiniband/hw/mthca/mthca_av.c    |  2 +-
 drivers/infiniband/hw/mthca/mthca_qp.c    |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |  2 +-
 drivers/infiniband/hw/qib/qib_rc.c        |  4 ++--
 drivers/infiniband/hw/qib/qib_ruc.c       | 11 ++++++-----
 drivers/infiniband/hw/qib/qib_uc.c        |  2 +-
 drivers/infiniband/hw/qib/qib_ud.c        |  8 ++++----
 include/rdma/ib_verbs.h                   |  2 +-
 22 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 382941b..c34dca3 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -870,7 +870,7 @@ static int init_newwin(struct ib_mad_send_wr_private *mad_send_wr)
 		if (ib_query_ah(mad_send_wr->send_buf.ah, &ah_attr))
 			continue;
 
-		if (rmpp_recv->slid == ah_attr.dlid) {
+		if (rmpp_recv->slid == (u16)ah_attr.dlid) {
 			newwin = rmpp_recv->repwin;
 			break;
 		}
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 0b0dc43..81b742c 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
 		pr_err("Couldn't find index for default PKey\n");
 
 	memset(&ah_attr, 0, sizeof ah_attr);
-	ah_attr.dlid     = (u16)port_attr.sm_lid;
+	ah_attr.dlid     = port_attr.sm_lid;
 	ah_attr.sl       = port_attr.sm_sl;
 	ah_attr.port_num = port->port_num;
 	if (port_attr.grh_required) {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7630c92..36f5bc1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2281,7 +2281,10 @@ ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
 	resp.dest.sgid_index        = attr->ah_attr.grh.sgid_index;
 	resp.dest.hop_limit         = attr->ah_attr.grh.hop_limit;
 	resp.dest.traffic_class     = attr->ah_attr.grh.traffic_class;
-	resp.dest.dlid              = attr->ah_attr.dlid;
+	if (rdma_cap_opa_ah(ib_dev, attr->ah_attr.port_num))
+		resp.dest.dlid	    = OPA_TO_IB_UCAST_LID(attr->ah_attr.dlid);
+	else
+		resp.dest.dlid	    = (u16)attr->ah_attr.dlid;
 	resp.dest.sl                = attr->ah_attr.sl;
 	resp.dest.src_path_bits     = attr->ah_attr.src_path_bits;
 	resp.dest.static_rate       = attr->ah_attr.static_rate;
@@ -2293,7 +2296,11 @@ ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file,
 	resp.alt_dest.sgid_index    = attr->alt_ah_attr.grh.sgid_index;
 	resp.alt_dest.hop_limit     = attr->alt_ah_attr.grh.hop_limit;
 	resp.alt_dest.traffic_class = attr->alt_ah_attr.grh.traffic_class;
-	resp.alt_dest.dlid          = attr->alt_ah_attr.dlid;
+	if (rdma_cap_opa_ah(ib_dev, attr->alt_ah_attr.port_num))
+		resp.alt_dest.dlid  =
+			OPA_TO_IB_UCAST_LID(attr->alt_ah_attr.dlid);
+	else
+		resp.alt_dest.dlid  = (u16)attr->alt_ah_attr.dlid;
 	resp.alt_dest.sl            = attr->alt_ah_attr.sl;
 	resp.alt_dest.src_path_bits = attr->alt_ah_attr.src_path_bits;
 	resp.alt_dest.static_rate   = attr->alt_ah_attr.static_rate;
diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c
index af020f8..f8c9008 100644
--- a/drivers/infiniband/core/uverbs_marshall.c
+++ b/drivers/infiniband/core/uverbs_marshall.c
@@ -42,7 +42,7 @@ void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
 	dst->grh.hop_limit         = src->grh.hop_limit;
 	dst->grh.traffic_class     = src->grh.traffic_class;
 	memset(&dst->grh.reserved, 0, sizeof(dst->grh.reserved));
-	dst->dlid 	    	   = src->dlid;
+	dst->dlid		   = (u16)src->dlid;
 	dst->sl   	    	   = src->sl;
 	dst->src_path_bits 	   = src->src_path_bits;
 	dst->static_rate   	   = src->static_rate;
diff --git a/drivers/infiniband/hw/hfi1/driver.c b/drivers/infiniband/hw/hfi1/driver.c
index 6563e4d..4195ba8 100644
--- a/drivers/infiniband/hw/hfi1/driver.c
+++ b/drivers/infiniband/hw/hfi1/driver.c
@@ -473,12 +473,12 @@ void hfi1_process_ecn_slowpath(struct rvt_qp *qp, struct hfi1_packet *pkt,
 			(dlid != be16_to_cpu(IB_LID_PERMISSIVE));
 		break;
 	case IB_QPT_UC:
-		rlid = qp->remote_ah_attr.dlid;
+		rlid = (u16)qp->remote_ah_attr.dlid;
 		rqpn = qp->remote_qpn;
 		svc_type = IB_CC_SVCTYPE_UC;
 		break;
 	case IB_QPT_RC:
-		rlid = qp->remote_ah_attr.dlid;
+		rlid = (u16)qp->remote_ah_attr.dlid;
 		rqpn = qp->remote_qpn;
 		svc_type = IB_CC_SVCTYPE_RC;
 		break;
diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index 8bc5013..caca6f5 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -890,7 +890,7 @@ void hfi1_send_rc_ack(struct hfi1_ctxtdata *rcd, struct rvt_qp *qp,
 	pbc_flags |= ((!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT);
 	lrh0 |= (sc5 & 0xf) << 12 | (qp->remote_ah_attr.sl & 0xf) << 4;
 	hdr.lrh[0] = cpu_to_be16(lrh0);
-	hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	hdr.lrh[1] = cpu_to_be16((u16)qp->remote_ah_attr.dlid);
 	hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
 	hdr.lrh[3] = cpu_to_be16(ppd->lid | qp->remote_ah_attr.src_path_bits);
 	ohdr->bth[0] = cpu_to_be32(bth0);
@@ -2306,7 +2306,7 @@ void hfi1_rc_rcv(struct hfi1_packet *packet)
 			wc.opcode = IB_WC_RECV;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = qp->remote_ah_attr.dlid;
+		wc.slid = (u16)qp->remote_ah_attr.dlid;
 		/*
 		 * It seems that IB mandates the presence of an SL in a
 		 * work completion only for the UD transport (see section
diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c
index a1576ae..2fe2b2f 100644
--- a/drivers/infiniband/hw/hfi1/ruc.c
+++ b/drivers/infiniband/hw/hfi1/ruc.c
@@ -297,7 +297,7 @@ int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct ib_header *hdr,
 			goto err;
 		}
 		/* Validate the SLID. See Ch. 9.6.1.5 and 17.2.8 */
-		if (be16_to_cpu(hdr->lrh[3]) != qp->alt_ah_attr.dlid ||
+		if (be16_to_cpu(hdr->lrh[3]) != (u16)qp->alt_ah_attr.dlid ||
 		    ppd_from_ibp(ibp)->port != qp->alt_ah_attr.port_num)
 			goto err;
 		spin_lock_irqsave(&qp->s_lock, flags);
@@ -332,7 +332,7 @@ int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct ib_header *hdr,
 			goto err;
 		}
 		/* Validate the SLID. See Ch. 9.6.1.5 */
-		if (be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid ||
+		if (be16_to_cpu(hdr->lrh[3]) != (u16)qp->remote_ah_attr.dlid ||
 		    ppd_from_ibp(ibp)->port != qp->port_num)
 			goto err;
 		if (qp->s_mig_state == IB_MIG_REARM &&
@@ -591,7 +591,7 @@ static void ruc_loopback(struct rvt_qp *sqp)
 	wc.byte_len = wqe->length;
 	wc.qp = &qp->ibqp;
 	wc.src_qp = qp->remote_qpn;
-	wc.slid = qp->remote_ah_attr.dlid;
+	wc.slid = (u16)qp->remote_ah_attr.dlid;
 	wc.sl = qp->remote_ah_attr.sl;
 	wc.port_num = 1;
 	/* Signal completion event if the solicited bit is set. */
@@ -812,7 +812,8 @@ void hfi1_make_ruc_header(struct rvt_qp *qp, struct ib_other_headers *ohdr,
 	else
 		qp->s_flags &= ~RVT_S_AHG_VALID;
 	ps->s_txreq->phdr.hdr.lrh[0] = cpu_to_be16(lrh0);
-	ps->s_txreq->phdr.hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	ps->s_txreq->phdr.hdr.lrh[1] =
+		cpu_to_be16((u16)qp->remote_ah_attr.dlid);
 	ps->s_txreq->phdr.hdr.lrh[2] =
 		cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
 	ps->s_txreq->phdr.hdr.lrh[3] = cpu_to_be16(ppd_from_ibp(ibp)->lid |
@@ -864,9 +865,9 @@ void hfi1_do_send(struct rvt_qp *qp)
 
 	switch (qp->ibqp.qp_type) {
 	case IB_QPT_RC:
-		if (!loopback && ((qp->remote_ah_attr.dlid & ~((1 << ps.ppd->lmc
-								) - 1)) ==
-				 ps.ppd->lid)) {
+		if (!loopback && (((u16)qp->remote_ah_attr.dlid &
+				   ~((1 << ps.ppd->lmc) - 1)) ==
+				  ps.ppd->lid)) {
 			ruc_loopback(qp);
 			return;
 		}
@@ -874,9 +875,9 @@ void hfi1_do_send(struct rvt_qp *qp)
 		timeout_int = (qp->timeout_jiffies);
 		break;
 	case IB_QPT_UC:
-		if (!loopback && ((qp->remote_ah_attr.dlid & ~((1 << ps.ppd->lmc
-								) - 1)) ==
-				 ps.ppd->lid)) {
+		if (!loopback && (((u16)qp->remote_ah_attr.dlid &
+				   ~((1 << ps.ppd->lmc) - 1)) ==
+				  ps.ppd->lid)) {
 			ruc_loopback(qp);
 			return;
 		}
diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c
index 5e6d1ba..0572dc7 100644
--- a/drivers/infiniband/hw/hfi1/uc.c
+++ b/drivers/infiniband/hw/hfi1/uc.c
@@ -451,7 +451,7 @@ void hfi1_uc_rcv(struct hfi1_packet *packet)
 		wc.status = IB_WC_SUCCESS;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = qp->remote_ah_attr.dlid;
+		wc.slid = (u16)qp->remote_ah_attr.dlid;
 		/*
 		 * It seems that IB mandates the presence of an SL in a
 		 * work completion only for the UD transport (see section
diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c
index 97ae24b..8a27ab1 100644
--- a/drivers/infiniband/hw/hfi1/ud.c
+++ b/drivers/infiniband/hw/hfi1/ud.c
@@ -113,7 +113,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
 			hfi1_bad_pqkey(ibp, OPA_TRAP_BAD_P_KEY, pkey,
 				       ah_attr->sl,
 				       sqp->ibqp.qp_num, qp->ibqp.qp_num,
-				       slid, ah_attr->dlid);
+				       slid, (u16)ah_attr->dlid);
 			goto drop;
 		}
 	}
@@ -137,7 +137,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
 				       ah_attr->sl,
 				       sqp->ibqp.qp_num, qp->ibqp.qp_num,
 				       lid,
-				       ah_attr->dlid);
+				       (u16)ah_attr->dlid);
 			goto drop;
 		}
 	}
@@ -248,7 +248,7 @@ static void ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
 	if (wc.slid == 0 && sqp->ibqp.qp_type == IB_QPT_GSI)
 		wc.slid = be16_to_cpu(IB_LID_PERMISSIVE);
 	wc.sl = ah_attr->sl;
-	wc.dlid_path_bits = ah_attr->dlid & ((1 << ppd->lmc) - 1);
+	wc.dlid_path_bits = (u16)ah_attr->dlid & ((1 << ppd->lmc) - 1);
 	wc.port_num = qp->port_num;
 	/* Signal completion event if the solicited bit is set. */
 	rvt_cq_enter(ibcq_to_rvtcq(qp->ibqp.recv_cq), &wc,
@@ -321,7 +321,7 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
 	ah_attr = &ibah_to_rvtah(wqe->ud_wr.ah)->attr;
 	if (ah_attr->dlid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
 	    ah_attr->dlid == be16_to_cpu(IB_LID_PERMISSIVE)) {
-		lid = ah_attr->dlid & ~((1 << ppd->lmc) - 1);
+		lid = (u16)ah_attr->dlid & ~((1 << ppd->lmc) - 1);
 		if (unlikely(!loopback &&
 			     (lid == ppd->lid ||
 			      (lid == be16_to_cpu(IB_LID_PERMISSIVE) &&
@@ -402,7 +402,7 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
 	priv->s_sendcontext = qp_to_send_context(qp, priv->s_sc);
 	ps->s_txreq->psc = priv->s_sendcontext;
 	ps->s_txreq->phdr.hdr.lrh[0] = cpu_to_be16(lrh0);
-	ps->s_txreq->phdr.hdr.lrh[1] = cpu_to_be16(ah_attr->dlid);
+	ps->s_txreq->phdr.hdr.lrh[1] = cpu_to_be16((u16)ah_attr->dlid);
 	ps->s_txreq->phdr.hdr.lrh[2] =
 		cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
 	if (ah_attr->dlid == be16_to_cpu(IB_LID_PERMISSIVE)) {
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 4b7a16c..a13bfdf 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1781,12 +1781,12 @@ void hfi1_cnp_rcv(struct hfi1_packet *packet)
 
 	switch (packet->qp->ibqp.qp_type) {
 	case IB_QPT_UC:
-		rlid = qp->remote_ah_attr.dlid;
+		rlid = (u16)qp->remote_ah_attr.dlid;
 		rqpn = qp->remote_qpn;
 		svc_type = IB_CC_SVCTYPE_UC;
 		break;
 	case IB_QPT_RC:
-		rlid = qp->remote_ah_attr.dlid;
+		rlid = (u16)qp->remote_ah_attr.dlid;
 		rqpn = qp->remote_qpn;
 		svc_type = IB_CC_SVCTYPE_RC;
 		break;
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 5fc6233..772789f 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -58,7 +58,7 @@ static struct ib_ah *create_ib_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
 		memcpy(ah->av.ib.dgid, ah_attr->grh.dgid.raw, 16);
 	}
 
-	ah->av.ib.dlid    = cpu_to_be16(ah_attr->dlid);
+	ah->av.ib.dlid    = cpu_to_be16((u16)ah_attr->dlid);
 	if (ah_attr->static_rate) {
 		ah->av.ib.stat_rate = ah_attr->static_rate + MLX4_STAT_RATE_OFFSET;
 		while (ah->av.ib.stat_rate > IB_RATE_2_5_GBPS + MLX4_STAT_RATE_OFFSET &&
diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
index a21d37f..d46a847 100644
--- a/drivers/infiniband/hw/mlx4/mcg.c
+++ b/drivers/infiniband/hw/mlx4/mcg.c
@@ -244,7 +244,7 @@ static int send_mad_to_slave(int slave, struct mlx4_ib_demux_ctx *ctx,
 	wc.sl = 0;
 	wc.dlid_path_bits = 0;
 	wc.port_num = ctx->port;
-	wc.slid = ah_attr.dlid;  /* opensm lid */
+	wc.slid = (u16)ah_attr.dlid;  /* opensm lid */
 	wc.src_qp = 1;
 	return mlx4_ib_send_to_slave(dev, slave, ctx->port, IB_QPT_GSI, &wc, NULL, mad);
 }
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 570bc86..ada1ecc 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1396,7 +1396,7 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 
 
 	path->grh_mylmc     = ah->src_path_bits & 0x7f;
-	path->rlid	    = cpu_to_be16(ah->dlid);
+	path->rlid	    = cpu_to_be16((u16)ah->dlid);
 	if (ah->static_rate) {
 		path->static_rate = ah->static_rate + MLX4_STAT_RATE_OFFSET;
 		while (path->static_rate > IB_RATE_2_5_GBPS + MLX4_STAT_RATE_OFFSET &&
diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 745efa4..6248542 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -56,7 +56,7 @@ static struct ib_ah *create_ib_ah(struct mlx5_ib_dev *dev,
 						ah_attr->grh.sgid_index);
 		ah->av.stat_rate_sl |= (ah_attr->sl & 0x7) << 1;
 	} else {
-		ah->av.rlid = cpu_to_be16(ah_attr->dlid);
+		ah->av.rlid = cpu_to_be16((u16)ah_attr->dlid);
 		ah->av.fl_mlid = ah_attr->src_path_bits & 0x7f;
 		ah->av.stat_rate_sl |= (ah_attr->sl & 0xf);
 	}
diff --git a/drivers/infiniband/hw/mthca/mthca_av.c b/drivers/infiniband/hw/mthca/mthca_av.c
index bcac294..9e0c5c8 100644
--- a/drivers/infiniband/hw/mthca/mthca_av.c
+++ b/drivers/infiniband/hw/mthca/mthca_av.c
@@ -200,7 +200,7 @@ int mthca_create_ah(struct mthca_dev *dev,
 
 	av->port_pd = cpu_to_be32(pd->pd_num | (ah_attr->port_num << 24));
 	av->g_slid  = ah_attr->src_path_bits;
-	av->dlid    = cpu_to_be16(ah_attr->dlid);
+	av->dlid    = cpu_to_be16((u16)ah_attr->dlid);
 	av->msg_sr  = (3 << 4) | /* 2K message */
 		mthca_get_rate(dev, ah_attr->static_rate, ah_attr->port_num);
 	av->sl_tclass_flowlabel = cpu_to_be32(ah_attr->sl << 28);
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 96e5fb9..32d000f 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -516,7 +516,7 @@ static int mthca_path_set(struct mthca_dev *dev, const struct ib_ah_attr *ah,
 			  struct mthca_qp_path *path, u8 port)
 {
 	path->g_mylmc     = ah->src_path_bits & 0x7f;
-	path->rlid        = cpu_to_be16(ah->dlid);
+	path->rlid        = cpu_to_be16((u16)ah->dlid);
 	path->static_rate = mthca_get_rate(dev, ah->static_rate, port);
 
 	if (ah->ah_flags & IB_AH_GRH) {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 797362a..ea90304 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -215,7 +215,7 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 
 	/* if pd is for the user process, pass the ah_id to user space */
 	if ((pd->uctx) && (pd->uctx->ah_tbl.va)) {
-		ahid_addr = pd->uctx->ah_tbl.va + attr->dlid;
+		ahid_addr = pd->uctx->ah_tbl.va + (u16)attr->dlid;
 		*ahid_addr = 0;
 		*ahid_addr |= ah->id & OCRDMA_AH_ID_MASK;
 		if (ocrdma_is_udp_encap_supported(dev)) {
diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c
index 2097512..91f0d08 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -665,7 +665,7 @@ void qib_send_rc_ack(struct rvt_qp *qp)
 	lrh0 |= ibp->sl_to_vl[qp->remote_ah_attr.sl] << 12 |
 		qp->remote_ah_attr.sl << 4;
 	hdr.lrh[0] = cpu_to_be16(lrh0);
-	hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	hdr.lrh[1] = cpu_to_be16((u16)qp->remote_ah_attr.dlid);
 	hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
 	hdr.lrh[3] = cpu_to_be16(ppd->lid | qp->remote_ah_attr.src_path_bits);
 	ohdr->bth[0] = cpu_to_be32(bth0);
@@ -2018,7 +2018,7 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct ib_header *hdr,
 			wc.opcode = IB_WC_RECV;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = qp->remote_ah_attr.dlid;
+		wc.slid = (u16)qp->remote_ah_attr.dlid;
 		wc.sl = qp->remote_ah_attr.sl;
 		/* zero fields that are N/A */
 		wc.vendor_err = 0;
diff --git a/drivers/infiniband/hw/qib/qib_ruc.c b/drivers/infiniband/hw/qib/qib_ruc.c
index de1bde5..0b0620f 100644
--- a/drivers/infiniband/hw/qib/qib_ruc.c
+++ b/drivers/infiniband/hw/qib/qib_ruc.c
@@ -297,7 +297,7 @@ int qib_ruc_check_hdr(struct qib_ibport *ibp, struct ib_header *hdr,
 			goto err;
 		}
 		/* Validate the SLID. See Ch. 9.6.1.5 and 17.2.8 */
-		if (be16_to_cpu(hdr->lrh[3]) != qp->alt_ah_attr.dlid ||
+		if (be16_to_cpu(hdr->lrh[3]) != (u16)qp->alt_ah_attr.dlid ||
 		    ppd_from_ibp(ibp)->port != qp->alt_ah_attr.port_num)
 			goto err;
 		spin_lock_irqsave(&qp->s_lock, flags);
@@ -330,7 +330,7 @@ int qib_ruc_check_hdr(struct qib_ibport *ibp, struct ib_header *hdr,
 			goto err;
 		}
 		/* Validate the SLID. See Ch. 9.6.1.5 */
-		if (be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid ||
+		if (be16_to_cpu(hdr->lrh[3]) != (u16)qp->remote_ah_attr.dlid ||
 		    ppd_from_ibp(ibp)->port != qp->port_num)
 			goto err;
 		if (qp->s_mig_state == IB_MIG_REARM &&
@@ -566,7 +566,7 @@ static void qib_ruc_loopback(struct rvt_qp *sqp)
 	wc.byte_len = wqe->length;
 	wc.qp = &qp->ibqp;
 	wc.src_qp = qp->remote_qpn;
-	wc.slid = qp->remote_ah_attr.dlid;
+	wc.slid = (u16)qp->remote_ah_attr.dlid;
 	wc.sl = qp->remote_ah_attr.sl;
 	wc.port_num = 1;
 	/* Signal completion event if the solicited bit is set. */
@@ -702,7 +702,7 @@ void qib_make_ruc_header(struct rvt_qp *qp, struct ib_other_headers *ohdr,
 	lrh0 |= ibp->sl_to_vl[qp->remote_ah_attr.sl] << 12 |
 		qp->remote_ah_attr.sl << 4;
 	priv->s_hdr->lrh[0] = cpu_to_be16(lrh0);
-	priv->s_hdr->lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	priv->s_hdr->lrh[1] = cpu_to_be16((u16)qp->remote_ah_attr.dlid);
 	priv->s_hdr->lrh[2] =
 			cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
 	priv->s_hdr->lrh[3] = cpu_to_be16(ppd_from_ibp(ibp)->lid |
@@ -744,7 +744,8 @@ void qib_do_send(struct rvt_qp *qp)
 
 	if ((qp->ibqp.qp_type == IB_QPT_RC ||
 	     qp->ibqp.qp_type == IB_QPT_UC) &&
-	    (qp->remote_ah_attr.dlid & ~((1 << ppd->lmc) - 1)) == ppd->lid) {
+	    (((u16)qp->remote_ah_attr.dlid) & ~((1 << ppd->lmc) - 1))
+	    == ppd->lid) {
 		qib_ruc_loopback(qp);
 		return;
 	}
diff --git a/drivers/infiniband/hw/qib/qib_uc.c b/drivers/infiniband/hw/qib/qib_uc.c
index 5b2d483..7a10748 100644
--- a/drivers/infiniband/hw/qib/qib_uc.c
+++ b/drivers/infiniband/hw/qib/qib_uc.c
@@ -403,7 +403,7 @@ void qib_uc_rcv(struct qib_ibport *ibp, struct ib_header *hdr,
 		wc.status = IB_WC_SUCCESS;
 		wc.qp = &qp->ibqp;
 		wc.src_qp = qp->remote_qpn;
-		wc.slid = qp->remote_ah_attr.dlid;
+		wc.slid = (u16)qp->remote_ah_attr.dlid;
 		wc.sl = qp->remote_ah_attr.sl;
 		/* zero fields that are N/A */
 		wc.vendor_err = 0;
diff --git a/drivers/infiniband/hw/qib/qib_ud.c b/drivers/infiniband/hw/qib/qib_ud.c
index f45cad1..37cd128 100644
--- a/drivers/infiniband/hw/qib/qib_ud.c
+++ b/drivers/infiniband/hw/qib/qib_ud.c
@@ -98,7 +98,7 @@ static void qib_ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
 				      ah_attr->sl,
 				      sqp->ibqp.qp_num, qp->ibqp.qp_num,
 				      cpu_to_be16(lid),
-				      cpu_to_be16(ah_attr->dlid));
+				      cpu_to_be16((u16)ah_attr->dlid));
 			goto drop;
 		}
 	}
@@ -122,7 +122,7 @@ static void qib_ud_loopback(struct rvt_qp *sqp, struct rvt_swqe *swqe)
 				      ah_attr->sl,
 				      sqp->ibqp.qp_num, qp->ibqp.qp_num,
 				      cpu_to_be16(lid),
-				      cpu_to_be16(ah_attr->dlid));
+				      cpu_to_be16((u16)ah_attr->dlid));
 			goto drop;
 		}
 	}
@@ -296,7 +296,7 @@ int qib_make_ud_req(struct rvt_qp *qp, unsigned long *flags)
 			this_cpu_inc(ibp->pmastats->n_unicast_xmit);
 	} else {
 		this_cpu_inc(ibp->pmastats->n_unicast_xmit);
-		lid = ah_attr->dlid & ~((1 << ppd->lmc) - 1);
+		lid = ((u16)ah_attr->dlid) & ~((1 << ppd->lmc) - 1);
 		if (unlikely(lid == ppd->lid)) {
 			unsigned long tflags = *flags;
 			/*
@@ -363,7 +363,7 @@ int qib_make_ud_req(struct rvt_qp *qp, unsigned long *flags)
 	else
 		lrh0 |= ibp->sl_to_vl[ah_attr->sl] << 12;
 	priv->s_hdr->lrh[0] = cpu_to_be16(lrh0);
-	priv->s_hdr->lrh[1] = cpu_to_be16(ah_attr->dlid);  /* DEST LID */
+	priv->s_hdr->lrh[1] = cpu_to_be16((u16)ah_attr->dlid);  /* DEST LID */
 	priv->s_hdr->lrh[2] =
 			cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
 	lid = ppd->lid;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 52216f6..294d3ed 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -814,7 +814,7 @@ struct ib_mr_status {
 
 struct ib_ah_attr {
 	struct ib_global_route	grh;
-	u16			dlid;
+	u32			dlid;
 	u8			sl;
 	u8			src_path_bits;
 	u8			static_rate;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 02/11] IB/core: Change port_attr.sm_lid from 16 to 32 bits
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

sm_lid field in port_attr is increased to 32 bits. This
enables core components to use the larger addresses if needed.
The user ABI is unchanged and return 16 bit values when queried.

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/sa_query.c    |  2 +-
 drivers/infiniband/core/uverbs_cmd.c  |  6 +++++-
 drivers/infiniband/ulp/srpt/ib_srpt.c |  2 +-
 include/rdma/ib_verbs.h               |  2 +-
 include/rdma/opa_addr.h               | 38 +++++++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 include/rdma/opa_addr.h

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 81b742c..0b0dc43 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
 		pr_err("Couldn't find index for default PKey\n");
 
 	memset(&ah_attr, 0, sizeof ah_attr);
-	ah_attr.dlid     = port_attr.sm_lid;
+	ah_attr.dlid     = (u16)port_attr.sm_lid;
 	ah_attr.sl       = port_attr.sm_sl;
 	ah_attr.port_num = port->port_num;
 	if (port_attr.grh_required) {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..7630c92 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -39,6 +39,7 @@
 #include <linux/sched.h>
 
 #include <asm/uaccess.h>
+#include <rdma/opa_addr.h>
 
 #include "uverbs.h"
 #include "core_priv.h"
@@ -515,7 +516,10 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
 	resp.qkey_viol_cntr  = attr.qkey_viol_cntr;
 	resp.pkey_tbl_len    = attr.pkey_tbl_len;
 	resp.lid 	     = attr.lid;
-	resp.sm_lid 	     = attr.sm_lid;
+	if (rdma_cap_opa_ah(ib_dev, cmd.port_num))
+		resp.sm_lid  = OPA_TO_IB_UCAST_LID(attr.sm_lid);
+	else
+		resp.sm_lid  = (u16)attr.sm_lid;
 	resp.lmc 	     = attr.lmc;
 	resp.max_vl_num      = attr.max_vl_num;
 	resp.sm_sl 	     = attr.sm_sl;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 0b1f69e..c6d0c47 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -514,7 +514,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
 	if (ret)
 		goto err_query_port;
 
-	sport->sm_lid = port_attr.sm_lid;
+	sport->sm_lid = (u16)port_attr.sm_lid;
 	sport->lid = port_attr.lid;
 
 	ret = ib_query_gid(sport->sdev->device, sport->port, 0, &sport->gid,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 30fa96e..52216f6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -521,7 +521,7 @@ struct ib_port_attr {
 	u32			qkey_viol_cntr;
 	u16			pkey_tbl_len;
 	u16			lid;
-	u16			sm_lid;
+	u32			sm_lid;
 	u8			lmc;
 	u8			max_vl_num;
 	u8			sm_sl;
diff --git a/include/rdma/opa_addr.h b/include/rdma/opa_addr.h
new file mode 100644
index 0000000..142b327
--- /dev/null
+++ b/include/rdma/opa_addr.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2016 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(OPA_ADDR_H)
+#define OPA_ADDR_H
+
+#define OPA_TO_IB_UCAST_LID(x)	(((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
+				 ? 0 : x)
+#endif /* OPA_ADDR_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 01/11] IB/core: Add rdma_cap_opa_ah to expose opa address handles
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford
In-Reply-To: <1479843532-47496-1-git-send-email-dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

rdma_cap_opa_ah(..) enables core components to check if the
corresponding port supports extended addresses

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Don Hiatt <don.hiatt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 include/rdma/ib_verbs.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..30fa96e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -479,6 +479,7 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
 /* Address format                       0x000FF000 */
 #define RDMA_CORE_CAP_AF_IB             0x00001000
 #define RDMA_CORE_CAP_ETH_AH            0x00002000
+#define RDMA_CORE_CAP_OPA_AH            0x00004000
 
 /* Protocol                             0xFFF00000 */
 #define RDMA_CORE_CAP_PROT_IB           0x00100000
@@ -2472,6 +2473,26 @@ static inline bool rdma_cap_eth_ah(const struct ib_device *device, u8 port_num)
 }
 
 /**
+ * rdma_cap_opa_ah - Check if the port of device has the capability
+ * OPA Address handle
+ * @device: Device to check
+ * @port_num: Port number to check
+ *
+ * OPA Address handles enable use of 32 bit LIDs by using a specially
+ * formatted GID field to carry the LID. This check enables kernel
+ * components to identify such a scheme so that they can then try
+ * to make use of the LID in the GID field.
+ *
+ * Return: true if we are running as a OPA device which enables
+ * 32 bit LIDs to be used in the fabric.
+ */
+static inline bool rdma_cap_opa_ah(struct ib_device *device, u8 port_num)
+{
+	return (device->port_immutable[port_num].core_cap_flags &
+		RDMA_CORE_CAP_OPA_AH) == RDMA_CORE_CAP_OPA_AH;
+}
+
+/**
  * rdma_max_mad_size - Return the max MAD size required by this RDMA Port.
  *
  * @device: Device
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 00/11] IB/core: Add 32 bit LID support
From: Dasaratharaman Chandramouli @ 2016-11-22 19:38 UTC (permalink / raw)
  To: Dasaratharaman Chandramouli, Ira Weiny, Don Hiatt, linux-rdma,
	Doug Ledford

OPA devices can support more than 48K LIDs in the fabric. A node with a LID
greater than 0xbfff is called an 'extended lid'. In order to support verbs with
extended LIDs it is necessary to modify some of the RDMA data structures where
LIDs are currently only 16 bits in length.

This patch series follows on what was presented at the OFA Workshop.  Rather
than breaking the current UABI we propose to extend the LID address space by
sending a 'special' GID value down the verbs stack that has the 32-bit LID
programmed in it. By having a means to differentiate a regular GID from our
'special' GID, the underlying OPA device driver is able to retrieve the 32-bit
LIDs from the GID fields instead of picking them up from the 16 bit lid fields.

Internal to the kernel data structures such as struct ib_wc, struct
ib_port_attr and related ones have been modified to use 32 bit LID fields.
These changes are specific to the kernel and do not break the current UABI.


Node <-> SM interaction in getting extended LID information
----------------------------------------------------------------------------
1. Source application determines the GID of the destination through standard
   means and send a pathrecord query to the SM.
2. SM (which is OPA specific) recognizes that one or more nodes in the
   pathrecord request uses extended LIDs.
3. SM issues a pathrecord response. The SGID and DGID fields in the pathrecord
   response is the specially formulated GID.
4. Additionally, SM sets the hoplimit field of the pathrecord to 1.
5. Source receives the response and can determine the actual LID of the
   destination, if needed, from the response.

Source Node <-> Destination Node interaction in using extended LID information
-------------------------------------------------------------------------------
1. Source uses the pathrecord response from the SM to create an address handle
   to the destination (either at user or kernel space).
2. Since hoplimit field in the pathrecord is > 0, GRH fields are enabled in the
   address handle.
3. Address handle information is now passed down through the RDMA stack and
   reaches the driver.
4. Driver looks at the GRH fields in the address handle and determines that the
   GID in the GRH is actually a special GID.
5. Driver retrieves LID from GID field and uses 16B packets to send data
   on the wire.
6. Driver at the receiving side determines that a GRH needs to be added to the
   address handle before passing it on to the destination application.
7. Destination now receives the packet and can send back the response using the
   same address handle information.

There are some obvious limitations with this scheme:
----------------------------------------------------
1. Multicast packets which always need a GRH cannot use this scheme.
   Essentially multicast LIDs cannot be extended.
2. Subnet routed packets which also need a GRH cannot fully use this scheme.
   Specifically the LID of the router itself cannot be extended.
   The actual destination can still be extended.
3. Applications will need to use pathrecords to get destination address
   information. Any other out-of-band mechanisms are not guaranteed to work.
4. As an extension to 3, applications that 'validate' pathrecord responses need
   to be careful not to treat 0 LID field as an error condition.

Changes from V1:
1. Increase ah_attr.dlid from 16 to 32 bits

Dasaratharaman Chandramouli (9):
  IB/core: Add rdma_cap_opa_ah to expose opa address handles
  IB/core: Change port_attr.sm_lid from 16 to 32 bits
  IB/core: Change ah_attr.dlid from 16 to 32 bits
  IB/core: Change port_attr.lid size from 16 to 32 bits
  IB/mad: Change slid in RMPP recv from 16 to 32 bits
  IB/SA: Program extended LID in SM Address handle
  IB/IPoIB: Retrieve 32 bit LIDs from path records when running on OPA
    devices
  IB/IPoIB: Modify ipoib_get_net_dev_by_params to lookup gid table
  IB/srpt: Increase lid and sm_lid to 32 bits

Don Hiatt (2):
  IB/core: Change wc.slid from 16 to 32 bits
  IB/mad: Ensure DR MADs are correctly specified when using OPA devices

 drivers/infiniband/core/cm.c              |   4 +-
 drivers/infiniband/core/mad.c             | 104 ++++++++++++++++++++++++++----
 drivers/infiniband/core/mad_rmpp.c        |   2 +-
 drivers/infiniband/core/sa_query.c        |   8 ++-
 drivers/infiniband/core/user_mad.c        |   2 +-
 drivers/infiniband/core/uverbs_cmd.c      |  23 +++++--
 drivers/infiniband/core/uverbs_marshall.c |   2 +-
 drivers/infiniband/hw/hfi1/driver.c       |   4 +-
 drivers/infiniband/hw/hfi1/mad.c          |   2 +-
 drivers/infiniband/hw/hfi1/rc.c           |   2 +-
 drivers/infiniband/hw/hfi1/ruc.c          |  19 +++---
 drivers/infiniband/hw/hfi1/ud.c           |  10 +--
 drivers/infiniband/hw/hfi1/verbs.c        |   4 +-
 drivers/infiniband/hw/mlx4/ah.c           |   2 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c   |   2 +-
 drivers/infiniband/hw/mlx4/mad.c          |   8 +--
 drivers/infiniband/hw/mlx4/qp.c           |   2 +-
 drivers/infiniband/hw/mlx5/ah.c           |   2 +-
 drivers/infiniband/hw/mlx5/mad.c          |   2 +-
 drivers/infiniband/hw/mthca/mthca_av.c    |   2 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c   |   4 +-
 drivers/infiniband/hw/mthca/mthca_mad.c   |   4 +-
 drivers/infiniband/hw/mthca/mthca_qp.c    |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |   2 +-
 drivers/infiniband/hw/qib/qib_rc.c        |   2 +-
 drivers/infiniband/hw/qib/qib_ruc.c       |   9 +--
 drivers/infiniband/hw/qib/qib_ud.c        |   8 +--
 drivers/infiniband/sw/rdmavt/cq.c         |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib.h      |   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |  11 ++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  63 +++++++++++++++++-
 drivers/infiniband/ulp/srpt/ib_srpt.h     |   4 +-
 include/rdma/ib_verbs.h                   |  29 +++++++--
 include/rdma/opa_addr.h                   |  68 +++++++++++++++++++
 34 files changed, 340 insertions(+), 78 deletions(-)
 create mode 100644 include/rdma/opa_addr.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Enabling peer to peer device transactions for PCIe devices
From: Serguei Sagalovitch @ 2016-11-22 18:59 UTC (permalink / raw)
  To: Dan Williams, Deucher, Alexander
  Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kuehling, Felix, Bridgman, John,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Koenig, Christian, Sander, Ben, Suthikulpanit, Suravee,
	Blinzer, Paul,
	Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAPcyv4i_5r2RVuV4F6V3ETbpKsf8jnMyQviZ7Legz3N4-v+9Og-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Dan,

I personally like "device-DAX" idea but my concerns are:

-  How well it will co-exists with the  DRM infrastructure / 
implementations
    in part dealing with CPU pointers?

-  How well we will be able to handle case when we need to "move"/"evict"
    memory/data to the new location so CPU pointer should point to the 
new physical location/address
     (and may be not in PCI device memory at all)?


Sincerely yours,
Serguei Sagalovitch

On 2016-11-22 01:11 PM, Dan Williams wrote:
> On Mon, Nov 21, 2016 at 12:36 PM, Deucher, Alexander
> <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>> This is certainly not the first time this has been brought up, but I'd like to try and get some consensus on the best way to move this forward.  Allowing devices to talk directly improves performance and reduces latency by avoiding the use of staging buffers in system memory.  Also in cases where both devices are behind a switch, it avoids the CPU entirely.  Most current APIs (DirectGMA, PeerDirect, CUDA, HSA) that deal with this are pointer based.  Ideally we'd be able to take a CPU virtual address and be able to get to a physical address taking into account IOMMUs, etc.  Having struct pages for the memory would allow it to work more generally and wouldn't require as much explicit support in drivers that wanted to use it.
>>
>> Some use cases:
>> 1. Storage devices streaming directly to GPU device memory
>> 2. GPU device memory to GPU device memory streaming
>> 3. DVB/V4L/SDI devices streaming directly to GPU device memory
>> 4. DVB/V4L/SDI devices streaming directly to storage devices
>>
>> Here is a relatively simple example of how this could work for testing.  This is obviously not a complete solution.
>> - Device memory will be registered with Linux memory sub-system by created corresponding struct page structures for device memory
>> - get_user_pages_fast() will  return corresponding struct pages when CPU address points to the device memory
>> - put_page() will deal with struct pages for device memory
>>
> [..]
>> 4. iopmem
>> iopmem : A block device for PCIe memory (https://lwn.net/Articles/703895/)
> The change I suggest for this particular approach is to switch to
> "device-DAX" [1]. I.e. a character device for establishing DAX
> mappings rather than a block device plus a DAX filesystem. The pro of
> this approach is standard user pointers and struct pages rather than a
> new construct. The con is that this is done via an interface separate
> from the existing gpu and storage device. For example it would require
> a /dev/dax instance alongside a /dev/nvme interface, but I don't see
> that as a significant blocking concern.
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-October/007496.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox