* [PATCH for-next V5 01/12] IB/core: Add RoCE GID table
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list Matan Barak
` (11 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Refactoring the GID management code requires us to have GIDs
alongside its meta information (the associated net_device).
This information is necessary in order to manage the GID
table successfully. For example, when a net_device is removed,
its associated GIDs need to be removed as well.
Adding a GID table that supports a lockless find, add and
delete gids. The lockless nature comes from using a unique
sequence number per table entry and detecting that while reading/
writing this sequence wasn't changed.
By using this RoCE GID table, providers must implement a
modify_gid callback. The table is managed exclusively by
this roce_gid_table and the provider just need to write
the data to the hardware.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/Makefile | 3 +-
drivers/infiniband/core/core_priv.h | 23 ++
drivers/infiniband/core/roce_gid_table.c | 470 +++++++++++++++++++++++++++++++
drivers/infiniband/hw/mlx4/main.c | 2 -
include/rdma/ib_verbs.h | 46 ++-
5 files changed, 540 insertions(+), 4 deletions(-)
create mode 100644 drivers/infiniband/core/roce_gid_table.c
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index acf7367..fbeb72a 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
- device.o fmr_pool.o cache.o netlink.o
+ device.o fmr_pool.o cache.o netlink.o \
+ roce_gid_table.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 87d1936..a9e58418 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -35,6 +35,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
+#include <net/net_namespace.h>
#include <rdma/ib_verbs.h>
@@ -51,4 +52,26 @@ void ib_cache_cleanup(void);
int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
+
+int roce_gid_table_get_gid(struct ib_device *ib_dev, u8 port, int index,
+ union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_gid_table_find_gid(struct ib_device *ib_dev, const union ib_gid *gid,
+ struct net_device *ndev, u8 *port,
+ u16 *index);
+
+int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
+ const union ib_gid *gid,
+ u8 port, struct net_device *ndev,
+ u16 *index);
+
+int roce_add_gid(struct ib_device *ib_dev, u8 port,
+ union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_gid(struct ib_device *ib_dev, u8 port,
+ union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev);
+
#endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_table.c b/drivers/infiniband/core/roce_gid_table.c
new file mode 100644
index 0000000..f492cf1
--- /dev/null
+++ b/drivers/infiniband/core/roce_gid_table.c
@@ -0,0 +1,470 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
+#include <rdma/ib_cache.h>
+
+#include "core_priv.h"
+
+union ib_gid zgid;
+EXPORT_SYMBOL_GPL(zgid);
+
+static const struct ib_gid_attr zattr;
+
+enum gid_attr_find_mask {
+ GID_ATTR_FIND_MASK_GID = 1UL << 0,
+ GID_ATTR_FIND_MASK_NETDEV = 1UL << 1,
+};
+
+struct dev_put_rcu {
+ struct rcu_head rcu;
+ struct net_device *ndev;
+};
+
+static void put_ndev(struct rcu_head *rcu)
+{
+ struct dev_put_rcu *put_rcu =
+ container_of(rcu, struct dev_put_rcu, rcu);
+
+ dev_put(put_rcu->ndev);
+ kfree(put_rcu);
+}
+
+static int write_gid(struct ib_device *ib_dev, u8 port,
+ struct ib_roce_gid_table *table, int ix,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr)
+{
+ int ret;
+ struct dev_put_rcu *put_rcu;
+ struct net_device *old_net_dev;
+
+ write_seqcount_begin(&table->data_vec[ix].seq);
+
+ ret = ib_dev->modify_gid(ib_dev, port, ix, gid, attr,
+ &table->data_vec[ix].context);
+
+ old_net_dev = table->data_vec[ix].attr.ndev;
+ if (old_net_dev && old_net_dev != attr->ndev) {
+ put_rcu = kmalloc(sizeof(*put_rcu), GFP_KERNEL);
+ if (put_rcu) {
+ put_rcu->ndev = old_net_dev;
+ call_rcu(&put_rcu->rcu, put_ndev);
+ } else {
+ pr_warn("roce_gid_table: can't allocate rcu context, using synchronize\n");
+ synchronize_rcu();
+ dev_put(old_net_dev);
+ }
+ }
+ /* if modify_gid failed, just delete the old gid */
+ if (ret || !memcmp(gid, &zgid, sizeof(*gid))) {
+ gid = &zgid;
+ attr = &zattr;
+ table->data_vec[ix].context = NULL;
+ }
+ memcpy(&table->data_vec[ix].gid, gid, sizeof(*gid));
+ memcpy(&table->data_vec[ix].attr, attr, sizeof(*attr));
+ if (table->data_vec[ix].attr.ndev &&
+ table->data_vec[ix].attr.ndev != old_net_dev)
+ dev_hold(table->data_vec[ix].attr.ndev);
+
+ write_seqcount_end(&table->data_vec[ix].seq);
+
+ if (!ret) {
+ struct ib_event event;
+
+ event.device = ib_dev;
+ event.element.port_num = port;
+ event.event = IB_EVENT_GID_CHANGE;
+
+ ib_dispatch_event(&event);
+ }
+ return ret;
+}
+
+static int find_gid(struct ib_roce_gid_table *table, const union ib_gid *gid,
+ const struct ib_gid_attr *val, unsigned long mask)
+{
+ int i;
+
+ for (i = 0; i < table->sz; i++) {
+ struct ib_gid_attr *attr = &table->data_vec[i].attr;
+ unsigned int orig_seq = read_seqcount_begin(&table->data_vec[i].seq);
+
+ if (memcmp(gid, &table->data_vec[i].gid, sizeof(*gid)))
+ continue;
+
+ if (mask & GID_ATTR_FIND_MASK_NETDEV &&
+ attr->ndev != val->ndev)
+ continue;
+
+ if (!read_seqcount_retry(&table->data_vec[i].seq, orig_seq))
+ return i;
+ /* The sequence number changed under our feet,
+ * the GID entry is invalid. Continue to the
+ * next entry.
+ */
+ }
+
+ return -1;
+}
+
+int roce_add_gid(struct ib_device *ib_dev, u8 port,
+ union ib_gid *gid, struct ib_gid_attr *attr)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ int ix;
+ int ret = 0;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return -EOPNOTSUPP;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+
+ if (!table)
+ return -EPROTONOSUPPORT;
+
+ if (!memcmp(gid, &zgid, sizeof(*gid)))
+ return -EINVAL;
+
+ mutex_lock(&table->lock);
+
+ ix = find_gid(table, gid, attr, GID_ATTR_FIND_MASK_NETDEV);
+ if (ix >= 0)
+ goto out_unlock;
+
+ ix = find_gid(table, &zgid, NULL, 0);
+ if (ix < 0) {
+ ret = -ENOSPC;
+ goto out_unlock;
+ }
+
+ write_gid(ib_dev, port, table, ix, gid, attr);
+
+out_unlock:
+ mutex_unlock(&table->lock);
+ return ret;
+}
+
+int roce_del_gid(struct ib_device *ib_dev, u8 port,
+ union ib_gid *gid, struct ib_gid_attr *attr)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ int ix;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return 0;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+
+ if (!table)
+ return -EPROTONOSUPPORT;
+
+ mutex_lock(&table->lock);
+
+ ix = find_gid(table, gid, attr,
+ GID_ATTR_FIND_MASK_NETDEV);
+ if (ix < 0)
+ goto out_unlock;
+
+ write_gid(ib_dev, port, table, ix, &zgid, &zattr);
+
+out_unlock:
+ mutex_unlock(&table->lock);
+ return 0;
+}
+
+int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ int ix;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return 0;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+
+ if (!table)
+ return -EPROTONOSUPPORT;
+
+ mutex_lock(&table->lock);
+
+ for (ix = 0; ix < table->sz; ix++)
+ if (table->data_vec[ix].attr.ndev == ndev)
+ write_gid(ib_dev, port, table, ix, &zgid, &zattr);
+
+ mutex_unlock(&table->lock);
+ return 0;
+}
+
+int roce_gid_table_get_gid(struct ib_device *ib_dev, u8 port, int index,
+ union ib_gid *gid, struct ib_gid_attr *attr)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ union ib_gid local_gid;
+ struct ib_gid_attr local_attr;
+ unsigned int orig_seq;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return -EOPNOTSUPP;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+
+ if (!table)
+ return -EPROTONOSUPPORT;
+
+ if (index < 0 || index >= table->sz)
+ return -EINVAL;
+
+ orig_seq = read_seqcount_begin(&table->data_vec[index].seq);
+
+ memcpy(&local_gid, &table->data_vec[index].gid, sizeof(local_gid));
+ memcpy(&local_attr, &table->data_vec[index].attr, sizeof(local_attr));
+
+ if (read_seqcount_retry(&table->data_vec[index].seq, orig_seq))
+ return -EAGAIN;
+
+ memcpy(gid, &local_gid, sizeof(*gid));
+ if (attr)
+ memcpy(attr, &local_attr, sizeof(*attr));
+ return 0;
+}
+
+static int _roce_gid_table_find_gid(struct ib_device *ib_dev,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *val,
+ unsigned long mask,
+ u8 *port, u16 *index)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ u8 p;
+ int local_index;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return -ENOENT;
+
+ for (p = 0; p < ib_dev->phys_port_cnt; p++) {
+ if (!rdma_protocol_roce(ib_dev, p + rdma_start_port(ib_dev)))
+ continue;
+ table = ports_table[p];
+ if (!table)
+ continue;
+ local_index = find_gid(table, gid, val, mask);
+ if (local_index >= 0) {
+ if (index)
+ *index = local_index;
+ if (port)
+ *port = p + rdma_start_port(ib_dev);
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+int roce_gid_table_find_gid(struct ib_device *ib_dev, const union ib_gid *gid,
+ struct net_device *ndev, u8 *port, u16 *index)
+{
+ unsigned long mask = GID_ATTR_FIND_MASK_GID;
+ struct ib_gid_attr gid_attr_val = {.ndev = ndev};
+
+ if (ndev)
+ mask |= GID_ATTR_FIND_MASK_NETDEV;
+
+ return _roce_gid_table_find_gid(ib_dev, gid, &gid_attr_val,
+ mask, port, index);
+}
+
+int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
+ const union ib_gid *gid,
+ u8 port, struct net_device *ndev,
+ u16 *index)
+{
+ int local_index;
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ struct ib_roce_gid_table *table;
+ unsigned long mask = 0;
+ struct ib_gid_attr val = {.ndev = ndev};
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table || port < rdma_start_port(ib_dev) ||
+ port > rdma_end_port(ib_dev))
+ return -ENOENT;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+ if (!table)
+ return -ENOENT;
+
+ if (ndev)
+ mask |= GID_ATTR_FIND_MASK_NETDEV;
+
+ local_index = find_gid(table, gid, &val, mask);
+ if (local_index >= 0) {
+ if (index)
+ *index = local_index;
+ return 0;
+ }
+
+ return -ENOENT;
+}
+
+static struct ib_roce_gid_table *alloc_roce_gid_table(int sz)
+{
+ unsigned int i;
+ struct ib_roce_gid_table *table =
+ kzalloc(sizeof(struct ib_roce_gid_table), GFP_KERNEL);
+ if (!table)
+ return NULL;
+
+ table->data_vec = kcalloc(sz, sizeof(*table->data_vec), GFP_KERNEL);
+ if (!table->data_vec)
+ goto err_free_table;
+
+ mutex_init(&table->lock);
+
+ table->sz = sz;
+
+ for (i = 0; i < sz; i++)
+ seqcount_init(&table->data_vec[i].seq);
+
+ return table;
+
+err_free_table:
+ kfree(table);
+ return NULL;
+}
+
+static void free_roce_gid_table(struct ib_device *ib_dev, u8 port,
+ struct ib_roce_gid_table *table)
+{
+ int i;
+
+ if (!table)
+ return;
+
+ for (i = 0; i < table->sz; ++i) {
+ if (memcmp(&table->data_vec[i].gid, &zgid,
+ sizeof(table->data_vec[i].gid)))
+ write_gid(ib_dev, port, table, i, &zgid, &zattr);
+ }
+ kfree(table->data_vec);
+ kfree(table);
+}
+
+static int roce_gid_table_setup_one(struct ib_device *ib_dev)
+{
+ u8 port;
+ struct ib_roce_gid_table **table;
+ int err = 0;
+
+ if (!ib_dev->modify_gid)
+ return -EOPNOTSUPP;
+
+ table = kcalloc(ib_dev->phys_port_cnt, sizeof(*table), GFP_KERNEL);
+
+ if (!table) {
+ pr_warn("failed to allocate roce addr table for %s\n",
+ ib_dev->name);
+ return -ENOMEM;
+ }
+
+ for (port = 0; port < ib_dev->phys_port_cnt; port++) {
+ uint8_t rdma_port = port + rdma_start_port(ib_dev);
+
+ if (!rdma_protocol_roce(ib_dev, rdma_port))
+ continue;
+ table[port] =
+ alloc_roce_gid_table(
+ ib_dev->port_immutable[rdma_port].gid_tbl_len);
+ if (!table[port]) {
+ err = -ENOMEM;
+ goto rollback_table_setup;
+ }
+ }
+
+ ib_dev->cache.roce_gid_table = table;
+ return 0;
+
+rollback_table_setup:
+ for (port = 1; port <= ib_dev->phys_port_cnt; port++)
+ free_roce_gid_table(ib_dev, port, table[port]);
+
+ kfree(table);
+ return err;
+}
+
+static void roce_gid_table_cleanup_one(struct ib_device *ib_dev,
+ struct ib_roce_gid_table **table)
+{
+ u8 port;
+
+ if (!table)
+ return;
+
+ for (port = 0; port < ib_dev->phys_port_cnt; port++)
+ free_roce_gid_table(ib_dev, port + rdma_start_port(ib_dev),
+ table[port]);
+
+ kfree(table);
+}
+
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 86c0c27..69ae464 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -93,8 +93,6 @@ static void init_query_mad(struct ib_smp *mad)
mad->method = IB_MGMT_METHOD_GET;
}
-static union ib_gid zgid;
-
static int check_flow_steering_support(struct mlx4_dev *dev)
{
int eth_num_ports = 0;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7d78794..72b62cd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -64,6 +64,27 @@ union ib_gid {
} global;
};
+extern union ib_gid zgid;
+
+struct ib_gid_attr {
+ struct net_device *ndev;
+};
+
+struct ib_roce_gid_table_entry {
+ seqcount_t seq;
+ union ib_gid gid;
+ struct ib_gid_attr attr;
+ void *context;
+};
+
+struct ib_roce_gid_table {
+ int active;
+ int sz;
+ /* locking against multiple writes in data_vec */
+ struct mutex lock;
+ struct ib_roce_gid_table_entry *data_vec;
+};
+
enum rdma_node_type {
/* IB values map to NodeInfo:NodeType. */
RDMA_NODE_IB_CA = 1,
@@ -272,7 +293,8 @@ enum ib_port_cap_flags {
IB_PORT_BOOT_MGMT_SUP = 1 << 23,
IB_PORT_LINK_LATENCY_SUP = 1 << 24,
IB_PORT_CLIENT_REG_SUP = 1 << 25,
- IB_PORT_IP_BASED_GIDS = 1 << 26
+ IB_PORT_IP_BASED_GIDS = 1 << 26,
+ IB_PORT_ROCE = 1 << 27,
};
enum ib_port_width {
@@ -1476,6 +1498,7 @@ struct ib_cache {
struct ib_pkey_cache **pkey_cache;
struct ib_gid_cache **gid_cache;
u8 *lmc_cache;
+ struct ib_roce_gid_table **roce_gid_table;
};
struct ib_dma_mapping_ops {
@@ -1559,6 +1582,27 @@ struct ib_device {
int (*query_gid)(struct ib_device *device,
u8 port_num, int index,
union ib_gid *gid);
+ /* When calling modify_gid, the HW vendor's driver should
+ * modify the gid of device @device at gid index @index of
+ * port @port to be @gid. Meta-info of that gid (for example,
+ * the network device related to this gid is available
+ * at @attr. @context allows the HW vendor driver to store extra
+ * information together with a GID entry. The HW vendor may allocate
+ * memory to contain this information and store it in @context when a
+ * new GID entry is written to. Upon the deletion of a GID entry,
+ * the HW vendor must free any allocated memory. The caller will clear
+ * @context afterwards.GID deletion is done by passing the zero gid.
+ * Params are consistent until the next call of modify_gid.
+ * The function should return 0 on success or error otherwise.
+ * The function could be called concurrently for different ports.
+ * This function is only called when roce_gid_table is used.
+ */
+ int (*modify_gid)(struct ib_device *device,
+ u8 port_num,
+ unsigned int index,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr,
+ void **context);
int (*query_pkey)(struct ib_device *device,
u8 port_num, u16 index, u16 *pkey);
int (*modify_device)(struct ib_device *device,
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 01/12] IB/core: Add RoCE GID table Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 03/12] IB/core: Add RoCE GID population Matan Barak
` (10 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Haggai Eran
From: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.
The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.
This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.
This patch is also a part of [2] "Add network namespace support in
RDMA-CM" patch series.
[1] http://www.spinics.net/lists/linux-rdma/msg24733.html
[2] http://permalink.gmane.org/gmane.linux.drivers.rdma/25588
Cc: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/device.c | 39 ++++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8d07c12..7e83f0d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -55,17 +55,24 @@ struct ib_client_data {
struct workqueue_struct *ib_wq;
EXPORT_SYMBOL_GPL(ib_wq);
+/* The device_list and client_list contain devices and clients after their
+ * registration has completed, and the devices and clients are removed
+ * during unregistration. */
static LIST_HEAD(device_list);
static LIST_HEAD(client_list);
/*
- * device_mutex protects access to both device_list and client_list.
- * There's no real point to using multiple locks or something fancier
- * like an rwsem: we always access both lists, and we're always
- * modifying one list or the other list. In any case this is not a
- * hot path so there's no point in trying to optimize.
+ * device_mutex and lists_rwsem protect access to both device_list and
+ * client_list. device_mutex protects writer access by device and client
+ * registration / de-registration. lists_rwsem protects reader access to
+ * these lists. Iterators of these lists must lock it for read, while updates
+ * to the lists must be done with a write lock. A special case is when the
+ * device_mutex is locked. In this case locking the lists for read access is
+ * not necessary as the device_mutex implies it.
*/
static DEFINE_MUTEX(device_mutex);
+static DECLARE_RWSEM(lists_rwsem);
+
static int ib_device_check_mandatory(struct ib_device *device)
{
@@ -294,8 +301,6 @@ int ib_register_device(struct ib_device *device,
goto out;
}
- list_add_tail(&device->core_list, &device_list);
-
device->reg_state = IB_DEV_REGISTERED;
{
@@ -306,6 +311,10 @@ int ib_register_device(struct ib_device *device,
client->add(device);
}
+ down_write(&lists_rwsem);
+ list_add_tail(&device->core_list, &device_list);
+ up_write(&lists_rwsem);
+
out:
mutex_unlock(&device_mutex);
return ret;
@@ -326,12 +335,14 @@ void ib_unregister_device(struct ib_device *device)
mutex_lock(&device_mutex);
+ down_write(&lists_rwsem);
+ list_del(&device->core_list);
+ up_write(&lists_rwsem);
+
list_for_each_entry_reverse(client, &client_list, list)
if (client->remove)
client->remove(device);
- list_del(&device->core_list);
-
mutex_unlock(&device_mutex);
ib_device_unregister_sysfs(device);
@@ -364,11 +375,14 @@ int ib_register_client(struct ib_client *client)
mutex_lock(&device_mutex);
- list_add_tail(&client->list, &client_list);
list_for_each_entry(device, &device_list, core_list)
if (client->add && !add_client_context(device, client))
client->add(device);
+ down_write(&lists_rwsem);
+ list_add_tail(&client->list, &client_list);
+ up_write(&lists_rwsem);
+
mutex_unlock(&device_mutex);
return 0;
@@ -391,6 +405,10 @@ void ib_unregister_client(struct ib_client *client)
mutex_lock(&device_mutex);
+ down_write(&lists_rwsem);
+ list_del(&client->list);
+ up_write(&lists_rwsem);
+
list_for_each_entry(device, &device_list, core_list) {
if (client->remove)
client->remove(device);
@@ -403,7 +421,6 @@ void ib_unregister_client(struct ib_client *client)
}
spin_unlock_irqrestore(&device->client_data_lock, flags);
}
- list_del(&client->list);
mutex_unlock(&device_mutex);
}
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 03/12] IB/core: Add RoCE GID population
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 01/12] IB/core: Add RoCE GID table Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
[not found] ` <1433772735-22416-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 04/12] net/ipv6: Export addrconf_ifid_eui48 Matan Barak
` (9 subsequent siblings)
12 siblings, 1 reply; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In order to populate the GID table, we need to listen for
events:
(a) IB device has been added or removed - used in order
to allocate/deallocate the table and populate
the GID table internally.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
(c) netdev up/down/change_addr - if a netdev is built onto our
RoCE device, we need to add/delete its IPs.
When an event is received, multiple entries (each with
different GID type) are added.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/core_priv.h | 26 ++
drivers/infiniband/core/device.c | 77 +++++
drivers/infiniband/core/roce_gid_mgmt.c | 471 +++++++++++++++++++++++++++++++
drivers/infiniband/core/roce_gid_table.c | 52 ++++
include/rdma/ib_addr.h | 2 +-
include/rdma/ib_verbs.h | 8 +
7 files changed, 636 insertions(+), 2 deletions(-)
create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index fbeb72a..3ceb3f8 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
- roce_gid_table.o
+ roce_gid_table.o roce_gid_mgmt.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index a9e58418..eab4e6c 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -39,6 +39,8 @@
#include <rdma/ib_verbs.h>
+extern struct workqueue_struct *roce_gid_mgmt_wq;
+
int ib_device_register_sysfs(struct ib_device *device,
int (*port_callback)(struct ib_device *,
u8, struct kobject *));
@@ -53,6 +55,22 @@ void ib_cache_cleanup(void);
int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
+typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
+ struct net_device *idev, void *cookie);
+
+typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
+ struct net_device *idev, void *cookie);
+
+void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev,
+ roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie);
+void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie);
+
int roce_gid_table_get_gid(struct ib_device *ib_dev, u8 port, int index,
union ib_gid *gid, struct ib_gid_attr *attr);
@@ -65,6 +83,9 @@ int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
u8 port, struct net_device *ndev,
u16 *index);
+int roce_gid_table_setup(void);
+void roce_gid_table_cleanup(void);
+
int roce_add_gid(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr);
@@ -74,4 +95,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
struct net_device *ndev);
+int roce_gid_mgmt_init(void);
+void roce_gid_mgmt_cleanup(void);
+
+int roce_rescan_device(struct ib_device *ib_dev);
+
#endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 7e83f0d..84edb9a 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,7 @@
#include <linux/init.h>
#include <linux/mutex.h>
#include <rdma/rdma_netlink.h>
+#include <rdma/ib_addr.h>
#include "core_priv.h"
@@ -597,6 +598,79 @@ int ib_query_gid(struct ib_device *device,
EXPORT_SYMBOL(ib_query_gid);
/**
+ * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in
+ * respect of netdev
+ * @ib_dev : IB device we want to query
+ * @filter: Should we call the callback?
+ * @filter_cookie: Cookie passed to filter
+ * @cb: Callback to call for each found RoCE ports
+ * @cookie: Cookie passed back to the callback
+ *
+ * Enumerates all of the physical RoCE ports of ib_dev RoCE ports
+ * which are relaying Ethernet packets to a specific
+ * (possibly virtual) netdevice according to filter.
+ */
+void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev,
+ roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie)
+{
+ u8 port;
+
+ if (ib_dev->modify_gid)
+ for (port = rdma_start_port(ib_dev); port <= rdma_end_port(ib_dev);
+ port++)
+ if (rdma_protocol_roce(ib_dev, port)) {
+ struct net_device *idev = NULL;
+
+ rcu_read_lock();
+ if (ib_dev->get_netdev)
+ idev = ib_dev->get_netdev(ib_dev, port);
+
+ if (idev &&
+ idev->reg_state >= NETREG_UNREGISTERED)
+ idev = NULL;
+
+ if (idev)
+ dev_hold(idev);
+
+ rcu_read_unlock();
+
+ if (filter(ib_dev, port, idev, filter_cookie))
+ cb(ib_dev, port, idev, cookie);
+
+ if (idev)
+ dev_put(idev);
+ }
+}
+
+/**
+ * ib_enum_roce_ports_of_netdev - enumerate RoCE ports of a netdev
+ * @filter: Should we call the callback?
+ * @filter_cookie: Cookie passed to filter
+ * @cb: Callback to call for each found RoCE ports
+ * @cookie: Cookie passed back to the callback
+ *
+ * Enumerates all of the physical RoCE ports which are relaying
+ * Ethernet packets to a specific (possibly virtual) netdevice
+ * according to filter.
+ */
+void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie)
+{
+ struct ib_device *dev;
+
+ down_read(&lists_rwsem);
+ list_for_each_entry_rcu(dev, &device_list, core_list)
+ ib_dev_roce_ports_of_netdev(dev, filter, filter_cookie, cb,
+ cookie);
+ up_read(&lists_rwsem);
+}
+
+/**
* ib_query_pkey - Get P_Key table entry
* @device:Device to query
* @port_num:Port number to query
@@ -751,6 +825,8 @@ static int __init ib_core_init(void)
goto err_sysfs;
}
+ roce_gid_table_setup();
+
ret = ib_cache_setup();
if (ret) {
printk(KERN_WARNING "Couldn't set up InfiniBand P_Key/GID cache\n");
@@ -772,6 +848,7 @@ err:
static void __exit ib_core_cleanup(void)
{
+ roce_gid_table_cleanup();
ib_cache_cleanup();
ibnl_cleanup();
ib_sysfs_cleanup();
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
new file mode 100644
index 0000000..70616fc
--- /dev/null
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -0,0 +1,471 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "core_priv.h"
+
+#include <linux/in.h>
+#include <linux/in6.h>
+
+/* For in6_dev_get/in6_dev_put */
+#include <net/addrconf.h>
+
+#include <rdma/ib_cache.h>
+#include <rdma/ib_addr.h>
+
+struct workqueue_struct *roce_gid_mgmt_wq;
+
+enum gid_op_type {
+ GID_DEL = 0,
+ GID_ADD
+};
+
+struct update_gid_event_work {
+ struct work_struct work;
+ union ib_gid gid;
+ struct ib_gid_attr gid_attr;
+ enum gid_op_type gid_op;
+};
+
+#define ROCE_NETDEV_CALLBACK_SZ 2
+struct netdev_event_work_cmd {
+ roce_netdev_callback cb;
+ roce_netdev_filter filter;
+};
+
+struct netdev_event_work {
+ struct work_struct work;
+ struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ];
+ struct net_device *ndev;
+};
+
+static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
+ u8 port, union ib_gid *gid,
+ struct ib_gid_attr *gid_attr)
+{
+ if (rdma_protocol_roce(ib_dev, port)) {
+ switch (gid_op) {
+ case GID_ADD:
+ roce_add_gid(ib_dev, port,
+ gid, gid_attr);
+ break;
+ case GID_DEL:
+ roce_del_gid(ib_dev, port,
+ gid, gid_attr);
+ break;
+ }
+ }
+}
+
+static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *rdev;
+ struct net_device *mdev;
+ struct net_device *ndev = (struct net_device *)cookie;
+
+ if (!idev)
+ return 0;
+
+ rcu_read_lock();
+ mdev = netdev_master_upper_dev_get_rcu(idev);
+ rdev = rdma_vlan_dev_real_dev(ndev);
+ rcu_read_unlock();
+
+ return (rdev ? rdev : ndev) == (mdev ? mdev : idev);
+}
+
+static int pass_all_filter(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ return 1;
+}
+
+static void update_gid_ip(enum gid_op_type gid_op,
+ struct ib_device *ib_dev,
+ u8 port, struct net_device *ndev,
+ const struct sockaddr *addr)
+{
+ union ib_gid gid;
+ struct ib_gid_attr gid_attr;
+
+ rdma_ip2gid(addr, &gid);
+ memset(&gid_attr, 0, sizeof(gid_attr));
+ gid_attr.ndev = ndev;
+
+ update_gid(gid_op, ib_dev, port, &gid, &gid_attr);
+}
+
+static void enum_netdev_ipv4_ips(struct ib_device *ib_dev,
+ u8 port, struct net_device *ndev)
+{
+ struct in_device *in_dev;
+
+ if (ndev->reg_state >= NETREG_UNREGISTERING)
+ return;
+
+ in_dev = in_dev_get(ndev);
+ if (!in_dev)
+ return;
+
+ for_ifa(in_dev) {
+ struct sockaddr_in ip;
+
+ ip.sin_family = AF_INET;
+ ip.sin_addr.s_addr = ifa->ifa_address;
+ update_gid_ip(GID_ADD, ib_dev, port, ndev,
+ (struct sockaddr *)&ip);
+ }
+ endfor_ifa(in_dev);
+
+ in_dev_put(in_dev);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static void enum_netdev_ipv6_ips(struct ib_device *ib_dev,
+ u8 port, struct net_device *ndev)
+{
+ struct inet6_ifaddr *ifp;
+ struct inet6_dev *in6_dev;
+ struct sin6_list {
+ struct list_head list;
+ struct sockaddr_in6 sin6;
+ };
+ struct sin6_list *sin6_iter;
+ struct sin6_list *sin6_temp;
+ struct ib_gid_attr gid_attr = {.ndev = ndev};
+ LIST_HEAD(sin6_list);
+
+ if (ndev->reg_state >= NETREG_UNREGISTERING)
+ return;
+
+ in6_dev = in6_dev_get(ndev);
+ if (!in6_dev)
+ return;
+
+ read_lock_bh(&in6_dev->lock);
+ list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
+ struct sin6_list *entry = kzalloc(sizeof(*entry), GFP_ATOMIC);
+
+ if (!entry) {
+ pr_warn("roce_gid_mgmt: couldn't allocate entry for IPv6 update\n");
+ continue;
+ }
+
+ entry->sin6.sin6_family = AF_INET6;
+ entry->sin6.sin6_addr = ifp->addr;
+ list_add_tail(&entry->list, &sin6_list);
+ }
+ read_unlock_bh(&in6_dev->lock);
+
+ in6_dev_put(in6_dev);
+
+ list_for_each_entry_safe(sin6_iter, sin6_temp, &sin6_list, list) {
+ union ib_gid gid;
+
+ rdma_ip2gid((const struct sockaddr *)&sin6_iter->sin6, &gid);
+ update_gid(GID_ADD, ib_dev, port, &gid, &gid_attr);
+ list_del(&sin6_iter->list);
+ kfree(sin6_iter);
+ }
+}
+#endif
+
+static void add_netdev_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+
+ enum_netdev_ipv4_ips(ib_dev, port, ndev);
+#if IS_ENABLED(CONFIG_IPV6)
+ enum_netdev_ipv6_ips(ib_dev, port, ndev);
+#endif
+}
+
+static void del_netdev_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+
+ roce_del_all_netdev_gids(ib_dev, port, ndev);
+}
+
+static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
+ u8 port,
+ struct net_device *idev,
+ void *cookie)
+{
+ struct net *net;
+ struct net_device *ndev;
+
+ /* Lock the rtnl to make sure the netdevs does not move under
+ * our feet
+ */
+ rtnl_lock();
+ for_each_net(net)
+ for_each_netdev(net, ndev)
+ if (is_eth_port_of_netdev(ib_dev, port, idev, ndev))
+ add_netdev_ips(ib_dev, port, idev, ndev);
+ rtnl_unlock();
+}
+
+/* This function will rescan all of the network devices in the system
+ * and add their gids, as needed, to the relevant RoCE devices. Will
+ * take rtnl and the IB device list mutexes. Must not be called from
+ * ib_wq or deadlock will happen. */
+int roce_rescan_device(struct ib_device *ib_dev)
+{
+ ib_dev_roce_ports_of_netdev(ib_dev, pass_all_filter, NULL,
+ enum_all_gids_of_dev_cb, NULL);
+
+ return 0;
+}
+
+static void callback_for_addr_gid_device_scan(struct ib_device *device,
+ u8 port,
+ struct net_device *idev,
+ void *cookie)
+{
+ struct update_gid_event_work *parsed = cookie;
+
+ return update_gid(parsed->gid_op, device,
+ port, &parsed->gid,
+ &parsed->gid_attr);
+}
+
+/* The following functions operate on all IB devices. netdevice_event and
+ * addr_event execute ib_enum_roce_ports_of_netdev through a work.
+ * ib_enum_roce_ports_of_netdev iterates through all IB devices, thus proper
+ * usage of SRCU is required
+ */
+
+static void netdevice_event_work_handler(struct work_struct *_work)
+{
+ struct netdev_event_work *work =
+ container_of(_work, struct netdev_event_work, work);
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(work->cmds) && work->cmds[i].cb; i++)
+ ib_enum_roce_ports_of_netdev(work->cmds[i].filter, work->ndev,
+ work->cmds[i].cb, work->ndev);
+
+ dev_put(work->ndev);
+ kfree(work);
+}
+
+static int netdevice_event(struct notifier_block *this, unsigned long event,
+ void *ptr)
+{
+ static const struct netdev_event_work_cmd add_cmd = {
+ .cb = add_netdev_ips, .filter = is_eth_port_of_netdev};
+ static const struct netdev_event_work_cmd del_cmd = {
+ .cb = del_netdev_ips, .filter = pass_all_filter};
+ struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+ struct netdev_event_work *ndev_work;
+ struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ] = { {NULL} };
+
+ if (ndev->type != ARPHRD_ETHER)
+ return NOTIFY_DONE;
+
+ switch (event) {
+ case NETDEV_REGISTER:
+ case NETDEV_UP:
+ cmds[0] = add_cmd;
+ break;
+
+ case NETDEV_UNREGISTER:
+ if (ndev->reg_state < NETREG_UNREGISTERED)
+ cmds[0] = del_cmd;
+ else
+ return NOTIFY_DONE;
+ break;
+
+ case NETDEV_CHANGEADDR:
+ cmds[0] = del_cmd;
+ cmds[1] = add_cmd;
+ break;
+ default:
+ return NOTIFY_DONE;
+ }
+
+ ndev_work = kmalloc(sizeof(*ndev_work), GFP_KERNEL);
+ if (!ndev_work) {
+ pr_warn("roce_gid_mgmt: can't allocate work for netdevice_event\n");
+ return NOTIFY_DONE;
+ }
+
+ memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
+ ndev_work->ndev = ndev;
+ dev_hold(ndev);
+ INIT_WORK(&ndev_work->work, netdevice_event_work_handler);
+
+ queue_work(roce_gid_mgmt_wq, &ndev_work->work);
+
+ return NOTIFY_DONE;
+}
+
+static void update_gid_event_work_handler(struct work_struct *_work)
+{
+ struct update_gid_event_work *work =
+ container_of(_work, struct update_gid_event_work, work);
+
+ ib_enum_roce_ports_of_netdev(is_eth_port_of_netdev, work->gid_attr.ndev,
+ callback_for_addr_gid_device_scan, work);
+
+ dev_put(work->gid_attr.ndev);
+ kfree(work);
+}
+
+static int addr_event(struct notifier_block *this, unsigned long event,
+ struct sockaddr *sa, struct net_device *ndev)
+{
+ struct update_gid_event_work *work;
+ enum gid_op_type gid_op;
+
+ if (ndev->type != ARPHRD_ETHER)
+ return NOTIFY_DONE;
+
+ switch (event) {
+ case NETDEV_UP:
+ gid_op = GID_ADD;
+ break;
+
+ case NETDEV_DOWN:
+ gid_op = GID_DEL;
+ break;
+
+ default:
+ return NOTIFY_DONE;
+ }
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (!work) {
+ pr_warn("roce_gid_mgmt: Couldn't allocate work for addr_event\n");
+ return NOTIFY_DONE;
+ }
+
+ INIT_WORK(&work->work, update_gid_event_work_handler);
+
+ rdma_ip2gid(sa, &work->gid);
+ work->gid_op = gid_op;
+
+ memset(&work->gid_attr, 0, sizeof(work->gid_attr));
+ dev_hold(ndev);
+ work->gid_attr.ndev = ndev;
+
+ queue_work(roce_gid_mgmt_wq, &work->work);
+
+ return NOTIFY_DONE;
+}
+
+static int inetaddr_event(struct notifier_block *this, unsigned long event,
+ void *ptr)
+{
+ struct sockaddr_in in;
+ struct net_device *ndev;
+ struct in_ifaddr *ifa = ptr;
+
+ in.sin_family = AF_INET;
+ in.sin_addr.s_addr = ifa->ifa_address;
+ ndev = ifa->ifa_dev->dev;
+
+ return addr_event(this, event, (struct sockaddr *)&in, ndev);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int inet6addr_event(struct notifier_block *this, unsigned long event,
+ void *ptr)
+{
+ struct sockaddr_in6 in6;
+ struct net_device *ndev;
+ struct inet6_ifaddr *ifa6 = ptr;
+
+ in6.sin6_family = AF_INET6;
+ in6.sin6_addr = ifa6->addr;
+ ndev = ifa6->idev->dev;
+
+ return addr_event(this, event, (struct sockaddr *)&in6, ndev);
+}
+#endif
+
+static struct notifier_block nb_netdevice = {
+ .notifier_call = netdevice_event
+};
+
+static struct notifier_block nb_inetaddr = {
+ .notifier_call = inetaddr_event
+};
+
+#if IS_ENABLED(CONFIG_IPV6)
+static struct notifier_block nb_inet6addr = {
+ .notifier_call = inet6addr_event
+};
+#endif
+
+int __init roce_gid_mgmt_init(void)
+{
+ roce_gid_mgmt_wq = alloc_ordered_workqueue("roce_gid_mgmt_wq", 0);
+
+ if (!roce_gid_mgmt_wq) {
+ pr_warn("roce_gid_mgmt: can't allocate work queue\n");
+ return -ENOMEM;
+ }
+
+ register_inetaddr_notifier(&nb_inetaddr);
+#if IS_ENABLED(CONFIG_IPV6)
+ register_inet6addr_notifier(&nb_inet6addr);
+#endif
+ /* We relay on the netdevice notifier to enumerate all
+ * existing devices in the system. Register to this notifier
+ * last to make sure we will not miss any IP add/del
+ * callbacks.
+ */
+ register_netdevice_notifier(&nb_netdevice);
+
+ return 0;
+}
+
+void __exit roce_gid_mgmt_cleanup(void)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+ unregister_inet6addr_notifier(&nb_inet6addr);
+#endif
+ unregister_inetaddr_notifier(&nb_inetaddr);
+ unregister_netdevice_notifier(&nb_netdevice);
+ /* Ensure all gid deletion tasks complete before we go down,
+ * to avoid any reference to free'd memory. By the time
+ * ib-core is removed, all physical devices have been removed,
+ * so no issue with remaining hardware contexts.
+ */
+ synchronize_rcu();
+ drain_workqueue(roce_gid_mgmt_wq);
+ destroy_workqueue(roce_gid_mgmt_wq);
+}
diff --git a/drivers/infiniband/core/roce_gid_table.c b/drivers/infiniband/core/roce_gid_table.c
index f492cf1..5e9e4dc 100644
--- a/drivers/infiniband/core/roce_gid_table.c
+++ b/drivers/infiniband/core/roce_gid_table.c
@@ -468,3 +468,55 @@ static void roce_gid_table_cleanup_one(struct ib_device *ib_dev,
kfree(table);
}
+static void roce_gid_table_client_cleanup_one(struct ib_device *ib_dev)
+{
+ struct ib_roce_gid_table **table = ib_dev->cache.roce_gid_table;
+
+ if (!table)
+ return;
+
+ ib_dev->cache.roce_gid_table = NULL;
+ /* smp_wmb is mandatory in order to make sure all executing works
+ * realize we're freeing this roce_gid_table. Every function which
+ * could be executed in a work, fetches ib_dev->cache.roce_gid_table
+ * once (READ_ONCE + smp_rmb) into a local variable.
+ * If it fetched a value != NULL, we wait for this work to finish by
+ * calling flush_workqueue. If it fetches NULL, it'll return immediately.
+ */
+ smp_wmb();
+ /* Make sure no gid update task is still referencing this device */
+ flush_workqueue(roce_gid_mgmt_wq);
+
+ roce_gid_table_cleanup_one(ib_dev, table);
+}
+
+static void roce_gid_table_client_setup_one(struct ib_device *ib_dev)
+{
+ if (!roce_gid_table_setup_one(ib_dev))
+ if (roce_rescan_device(ib_dev))
+ roce_gid_table_client_cleanup_one(ib_dev);
+}
+
+static struct ib_client table_client = {
+ .name = "roce_gid_table",
+ .add = roce_gid_table_client_setup_one,
+ .remove = roce_gid_table_client_cleanup_one
+};
+
+int __init roce_gid_table_setup(void)
+{
+ roce_gid_mgmt_init();
+
+ return ib_register_client(&table_client);
+}
+
+void __exit roce_gid_table_cleanup(void)
+{
+ ib_unregister_client(&table_client);
+
+ roce_gid_mgmt_cleanup();
+
+ flush_workqueue(system_wq);
+
+ rcu_barrier();
+}
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index fde33ac..850eec6 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -142,7 +142,7 @@ static inline u16 rdma_vlan_dev_vlan_id(const struct net_device *dev)
vlan_dev_vlan_id(dev) : 0xffff;
}
-static inline int rdma_ip2gid(struct sockaddr *addr, union ib_gid *gid)
+static inline int rdma_ip2gid(const struct sockaddr *addr, union ib_gid *gid)
{
switch (addr->sa_family) {
case AF_INET:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 72b62cd..05dcfad 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1579,6 +1579,14 @@ struct ib_device {
struct ib_port_attr *port_attr);
enum rdma_link_layer (*get_link_layer)(struct ib_device *device,
u8 port_num);
+ /* When calling get_netdev, the HW vendor's driver should return the
+ * net device of device @device at port @port_num. The function
+ * is called in rtnl_lock. The HW vendor's device driver must guarantee
+ * to return NULL before the net device has reached
+ * NETDEV_UNREGISTER_FINAL state.
+ */
+ struct net_device *(*get_netdev)(struct ib_device *device,
+ u8 port_num);
int (*query_gid)(struct ib_device *device,
u8 port_num, int index,
union ib_gid *gid);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 04/12] net/ipv6: Export addrconf_ifid_eui48
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (2 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 03/12] IB/core: Add RoCE GID population Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 05/12] IB/core: Add default GID for RoCE GID table Matan Barak
` (8 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
RoCE devices would like to have a default GID even
when the interface is down. In order to do so,
we use the IPv6 link local address as a default
GID. addrconf_ifid_eui48 is used to gernerate
this address.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
include/net/addrconf.h | 31 +++++++++++++++++++++++++++++++
net/ipv6/addrconf.c | 31 -------------------------------
2 files changed, 31 insertions(+), 31 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 80456f7..89890e7 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -91,6 +91,37 @@ int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2);
void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr);
+static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
+{
+ if (dev->addr_len != ETH_ALEN)
+ return -1;
+ memcpy(eui, dev->dev_addr, 3);
+ memcpy(eui + 5, dev->dev_addr + 3, 3);
+
+ /*
+ * The zSeries OSA network cards can be shared among various
+ * OS instances, but the OSA cards have only one MAC address.
+ * This leads to duplicate address conflicts in conjunction
+ * with IPv6 if more than one instance uses the same card.
+ *
+ * The driver for these cards can deliver a unique 16-bit
+ * identifier for each instance sharing the same card. It is
+ * placed instead of 0xFFFE in the interface identifier. The
+ * "u" bit of the interface identifier is not inverted in this
+ * case. Hence the resulting interface identifier has local
+ * scope according to RFC2373.
+ */
+ if (dev->dev_id) {
+ eui[3] = (dev->dev_id >> 8) & 0xFF;
+ eui[4] = dev->dev_id & 0xFF;
+ } else {
+ eui[3] = 0xFF;
+ eui[4] = 0xFE;
+ eui[0] ^= 2;
+ }
+ return 0;
+}
+
static inline unsigned long addrconf_timeout_fixup(u32 timeout,
unsigned int unit)
{
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 37b70e8..7170c7b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1845,37 +1845,6 @@ static void addrconf_leave_anycast(struct inet6_ifaddr *ifp)
__ipv6_dev_ac_dec(ifp->idev, &addr);
}
-static int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
-{
- if (dev->addr_len != ETH_ALEN)
- return -1;
- memcpy(eui, dev->dev_addr, 3);
- memcpy(eui + 5, dev->dev_addr + 3, 3);
-
- /*
- * The zSeries OSA network cards can be shared among various
- * OS instances, but the OSA cards have only one MAC address.
- * This leads to duplicate address conflicts in conjunction
- * with IPv6 if more than one instance uses the same card.
- *
- * The driver for these cards can deliver a unique 16-bit
- * identifier for each instance sharing the same card. It is
- * placed instead of 0xFFFE in the interface identifier. The
- * "u" bit of the interface identifier is not inverted in this
- * case. Hence the resulting interface identifier has local
- * scope according to RFC2373.
- */
- if (dev->dev_id) {
- eui[3] = (dev->dev_id >> 8) & 0xFF;
- eui[4] = dev->dev_id & 0xFF;
- } else {
- eui[3] = 0xFF;
- eui[4] = 0xFE;
- eui[0] ^= 2;
- }
- return 0;
-}
-
static int addrconf_ifid_eui64(u8 *eui, struct net_device *dev)
{
if (dev->addr_len != IEEE802154_ADDR_LEN)
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 05/12] IB/core: Add default GID for RoCE GID table
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (3 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 04/12] net/ipv6: Export addrconf_ifid_eui48 Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
[not found] ` <1433772735-22416-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 06/12] net: Add info for NETDEV_CHANGEUPPER event Matan Barak
` (7 subsequent siblings)
12 siblings, 1 reply; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
When RoCE is used, a default GID address should be generated
for every supported RoCE type. These default GID addresses are
generated based on the IPv6 link-local address, but in contrast
to the GID based on the regular IPv6 link-local (as we generate
GID per IP address), these GIDs are also available if the net
device is down (in order to support loopback).
Moreover, these default GID addresses can't be deleted.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/core_priv.h | 12 +++
drivers/infiniband/core/roce_gid_mgmt.c | 25 ++++-
drivers/infiniband/core/roce_gid_table.c | 162 ++++++++++++++++++++++++++++---
include/rdma/ib_verbs.h | 1 +
4 files changed, 185 insertions(+), 15 deletions(-)
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index eab4e6c..8da7a86 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -83,6 +83,16 @@ int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
u8 port, struct net_device *ndev,
u16 *index);
+enum roce_gid_table_default_mode {
+ ROCE_GID_TABLE_DEFAULT_MODE_SET,
+ ROCE_GID_TABLE_DEFAULT_MODE_DELETE
+};
+
+void roce_gid_table_set_default_gid(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev,
+ unsigned long gid_type_mask,
+ enum roce_gid_table_default_mode mode);
+
int roce_gid_table_setup(void);
void roce_gid_table_cleanup(void);
@@ -99,5 +109,7 @@ int roce_gid_mgmt_init(void);
void roce_gid_mgmt_cleanup(void);
int roce_rescan_device(struct ib_device *ib_dev);
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
+
#endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 70616fc..6dcd1c7 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -67,11 +67,18 @@ struct netdev_event_work {
struct net_device *ndev;
};
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port)
+{
+ return !!rdma_protocol_roce(ib_dev, port);
+}
+
static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
u8 port, union ib_gid *gid,
struct ib_gid_attr *gid_attr)
{
- if (rdma_protocol_roce(ib_dev, port)) {
+ unsigned long gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+ if (gid_type_mask) {
switch (gid_op) {
case GID_ADD:
roce_add_gid(ib_dev, port,
@@ -124,6 +131,21 @@ static void update_gid_ip(enum gid_op_type gid_op,
update_gid(gid_op, ib_dev, port, &gid, &gid_attr);
}
+static void enum_netdev_default_gids(struct ib_device *ib_dev,
+ u8 port, struct net_device *ndev,
+ struct net_device *idev)
+{
+ unsigned long gid_type_mask;
+
+ if (idev != ndev)
+ return;
+
+ gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+ roce_gid_table_set_default_gid(ib_dev, port, idev, gid_type_mask,
+ ROCE_GID_TABLE_DEFAULT_MODE_SET);
+}
+
static void enum_netdev_ipv4_ips(struct ib_device *ib_dev,
u8 port, struct net_device *ndev)
{
@@ -204,6 +226,7 @@ static void add_netdev_ips(struct ib_device *ib_dev, u8 port,
{
struct net_device *ndev = (struct net_device *)cookie;
+ enum_netdev_default_gids(ib_dev, port, ndev, idev);
enum_netdev_ipv4_ips(ib_dev, port, ndev);
#if IS_ENABLED(CONFIG_IPV6)
enum_netdev_ipv6_ips(ib_dev, port, ndev);
diff --git a/drivers/infiniband/core/roce_gid_table.c b/drivers/infiniband/core/roce_gid_table.c
index 5e9e4dc..f0e68dc 100644
--- a/drivers/infiniband/core/roce_gid_table.c
+++ b/drivers/infiniband/core/roce_gid_table.c
@@ -34,6 +34,7 @@
#include <linux/netdevice.h>
#include <linux/rtnetlink.h>
#include <rdma/ib_cache.h>
+#include <net/addrconf.h>
#include "core_priv.h"
@@ -45,6 +46,7 @@ static const struct ib_gid_attr zattr;
enum gid_attr_find_mask {
GID_ATTR_FIND_MASK_GID = 1UL << 0,
GID_ATTR_FIND_MASK_NETDEV = 1UL << 1,
+ GID_ATTR_FIND_MASK_DEFAULT = 1UL << 2,
};
struct dev_put_rcu {
@@ -64,7 +66,8 @@ static void put_ndev(struct rcu_head *rcu)
static int write_gid(struct ib_device *ib_dev, u8 port,
struct ib_roce_gid_table *table, int ix,
const union ib_gid *gid,
- const struct ib_gid_attr *attr)
+ const struct ib_gid_attr *attr,
+ bool default_gid)
{
int ret;
struct dev_put_rcu *put_rcu;
@@ -72,6 +75,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
write_seqcount_begin(&table->data_vec[ix].seq);
+ table->data_vec[ix].default_gid = default_gid;
ret = ib_dev->modify_gid(ib_dev, port, ix, gid, attr,
&table->data_vec[ix].context);
@@ -114,7 +118,8 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
}
static int find_gid(struct ib_roce_gid_table *table, const union ib_gid *gid,
- const struct ib_gid_attr *val, unsigned long mask)
+ const struct ib_gid_attr *val, bool default_gid,
+ unsigned long mask)
{
int i;
@@ -122,13 +127,18 @@ static int find_gid(struct ib_roce_gid_table *table, const union ib_gid *gid,
struct ib_gid_attr *attr = &table->data_vec[i].attr;
unsigned int orig_seq = read_seqcount_begin(&table->data_vec[i].seq);
- if (memcmp(gid, &table->data_vec[i].gid, sizeof(*gid)))
+ if (mask & GID_ATTR_FIND_MASK_GID &&
+ memcmp(gid, &table->data_vec[i].gid, sizeof(*gid)))
continue;
if (mask & GID_ATTR_FIND_MASK_NETDEV &&
attr->ndev != val->ndev)
continue;
+ if (mask & GID_ATTR_FIND_MASK_DEFAULT &&
+ table->data_vec[i].default_gid != default_gid)
+ continue;
+
if (!read_seqcount_retry(&table->data_vec[i].seq, orig_seq))
return i;
/* The sequence number changed under our feet,
@@ -140,6 +150,12 @@ static int find_gid(struct ib_roce_gid_table *table, const union ib_gid *gid,
return -1;
}
+static void make_default_gid(struct net_device *dev, union ib_gid *gid)
+{
+ gid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
+ addrconf_ifid_eui48(&gid->raw[8], dev);
+}
+
int roce_add_gid(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr)
{
@@ -148,6 +164,7 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port,
struct ib_roce_gid_table *table;
int ix;
int ret = 0;
+ struct net_device *idev;
/* make sure we read the ports_table */
smp_rmb();
@@ -163,19 +180,37 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port,
if (!memcmp(gid, &zgid, sizeof(*gid)))
return -EINVAL;
+ if (ib_dev->get_netdev) {
+ rcu_read_lock();
+ idev = ib_dev->get_netdev(ib_dev, port);
+ if (idev && attr->ndev != idev) {
+ union ib_gid default_gid;
+
+ /* Adding default GIDs in not permitted */
+ make_default_gid(idev, &default_gid);
+ if (!memcmp(gid, &default_gid, sizeof(*gid))) {
+ rcu_read_unlock();
+ return -EPERM;
+ }
+ }
+ rcu_read_unlock();
+ }
+
mutex_lock(&table->lock);
- ix = find_gid(table, gid, attr, GID_ATTR_FIND_MASK_NETDEV);
+ ix = find_gid(table, gid, attr, false, GID_ATTR_FIND_MASK_GID |
+ GID_ATTR_FIND_MASK_NETDEV);
if (ix >= 0)
goto out_unlock;
- ix = find_gid(table, &zgid, NULL, 0);
+ ix = find_gid(table, &zgid, NULL, false, GID_ATTR_FIND_MASK_GID |
+ GID_ATTR_FIND_MASK_DEFAULT);
if (ix < 0) {
ret = -ENOSPC;
goto out_unlock;
}
- write_gid(ib_dev, port, table, ix, gid, attr);
+ write_gid(ib_dev, port, table, ix, gid, attr, false);
out_unlock:
mutex_unlock(&table->lock);
@@ -188,6 +223,7 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
struct ib_roce_gid_table **ports_table =
READ_ONCE(ib_dev->cache.roce_gid_table);
struct ib_roce_gid_table *table;
+ union ib_gid default_gid;
int ix;
/* make sure we read the ports_table */
@@ -201,14 +237,23 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
if (!table)
return -EPROTONOSUPPORT;
+ if (attr->ndev) {
+ /* Deleting default GIDs in not permitted */
+ make_default_gid(attr->ndev, &default_gid);
+ if (!memcmp(gid, &default_gid, sizeof(*gid)))
+ return -EPERM;
+ }
+
mutex_lock(&table->lock);
- ix = find_gid(table, gid, attr,
- GID_ATTR_FIND_MASK_NETDEV);
+ ix = find_gid(table, gid, attr, false,
+ GID_ATTR_FIND_MASK_GID |
+ GID_ATTR_FIND_MASK_NETDEV |
+ GID_ATTR_FIND_MASK_DEFAULT);
if (ix < 0)
goto out_unlock;
- write_gid(ib_dev, port, table, ix, &zgid, &zattr);
+ write_gid(ib_dev, port, table, ix, &zgid, &zattr, false);
out_unlock:
mutex_unlock(&table->lock);
@@ -238,7 +283,7 @@ int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
for (ix = 0; ix < table->sz; ix++)
if (table->data_vec[ix].attr.ndev == ndev)
- write_gid(ib_dev, port, table, ix, &zgid, &zattr);
+ write_gid(ib_dev, port, table, ix, &zgid, &zattr, false);
mutex_unlock(&table->lock);
return 0;
@@ -306,7 +351,7 @@ static int _roce_gid_table_find_gid(struct ib_device *ib_dev,
table = ports_table[p];
if (!table)
continue;
- local_index = find_gid(table, gid, val, mask);
+ local_index = find_gid(table, gid, val, false, mask);
if (local_index >= 0) {
if (index)
*index = local_index;
@@ -341,7 +386,7 @@ int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
struct ib_roce_gid_table **ports_table =
READ_ONCE(ib_dev->cache.roce_gid_table);
struct ib_roce_gid_table *table;
- unsigned long mask = 0;
+ unsigned long mask = GID_ATTR_FIND_MASK_GID;
struct ib_gid_attr val = {.ndev = ndev};
/* make sure we read the ports_table */
@@ -358,7 +403,7 @@ int roce_gid_table_find_gid_by_port(struct ib_device *ib_dev,
if (ndev)
mask |= GID_ATTR_FIND_MASK_NETDEV;
- local_index = find_gid(table, gid, &val, mask);
+ local_index = find_gid(table, gid, &val, false, mask);
if (local_index >= 0) {
if (index)
*index = local_index;
@@ -405,12 +450,95 @@ static void free_roce_gid_table(struct ib_device *ib_dev, u8 port,
for (i = 0; i < table->sz; ++i) {
if (memcmp(&table->data_vec[i].gid, &zgid,
sizeof(table->data_vec[i].gid)))
- write_gid(ib_dev, port, table, i, &zgid, &zattr);
+ write_gid(ib_dev, port, table, i, &zgid, &zattr,
+ table->data_vec[i].default_gid);
}
kfree(table->data_vec);
kfree(table);
}
+void roce_gid_table_set_default_gid(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev,
+ unsigned long gid_type_mask,
+ enum roce_gid_table_default_mode mode)
+{
+ struct ib_roce_gid_table **ports_table =
+ READ_ONCE(ib_dev->cache.roce_gid_table);
+ union ib_gid gid;
+ struct ib_gid_attr gid_attr;
+ struct ib_roce_gid_table *table;
+
+ /* make sure we read the ports_table */
+ smp_rmb();
+
+ if (!ports_table)
+ return;
+
+ table = ports_table[port - rdma_start_port(ib_dev)];
+
+ if (!table)
+ return;
+
+ make_default_gid(ndev, &gid);
+ memset(&gid_attr, 0, sizeof(gid_attr));
+ gid_attr.ndev = ndev;
+ if (gid_type_mask) {
+ int ix;
+ union ib_gid current_gid;
+ struct ib_gid_attr current_gid_attr;
+
+ ix = find_gid(table, &gid, &gid_attr, true,
+ GID_ATTR_FIND_MASK_DEFAULT);
+
+ if (ix < 0) {
+ pr_warn("roce_gid_table: couldn't find index for default gid\n");
+ return;
+ }
+
+ mutex_lock(&table->lock);
+ if (!roce_gid_table_get_gid(ib_dev, port, ix,
+ ¤t_gid, ¤t_gid_attr) &&
+ mode == ROCE_GID_TABLE_DEFAULT_MODE_SET &&
+ !memcmp(&gid, ¤t_gid, sizeof(gid)) &&
+ !memcmp(&gid_attr, ¤t_gid_attr, sizeof(gid_attr)))
+ goto unlock_mutex;
+
+ if ((memcmp(¤t_gid, &zgid, sizeof(current_gid)) ||
+ memcmp(¤t_gid_attr, &zattr,
+ sizeof(current_gid_attr))) &&
+ write_gid(ib_dev, port, table, ix, &zgid, &zattr, true)) {
+ pr_warn("roce_gid_table: can't delete index %d for default gid %pI6\n",
+ ix, gid.raw);
+ goto unlock_mutex;
+ }
+
+ if (mode == ROCE_GID_TABLE_DEFAULT_MODE_SET)
+ if (write_gid(ib_dev, port, table, ix, &gid, &gid_attr,
+ true))
+ pr_warn("roce_gid_table: unable to add default gid %pI6\n",
+ gid.raw);
+ }
+
+unlock_mutex:
+ mutex_unlock(&table->lock);
+}
+
+static int roce_gid_table_reserve_default(struct ib_device *ib_dev, u8 port,
+ struct ib_roce_gid_table *table)
+{
+ unsigned long roce_gid_type_mask;
+
+ roce_gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+ if (roce_gid_type_mask) {
+ struct ib_roce_gid_table_entry *entry =
+ &table->data_vec[0];
+
+ entry->default_gid = true;
+ }
+
+ return 0;
+}
+
static int roce_gid_table_setup_one(struct ib_device *ib_dev)
{
u8 port;
@@ -440,6 +568,12 @@ static int roce_gid_table_setup_one(struct ib_device *ib_dev)
err = -ENOMEM;
goto rollback_table_setup;
}
+
+ err = roce_gid_table_reserve_default(ib_dev,
+ port + rdma_start_port(ib_dev),
+ table[port]);
+ if (err)
+ goto rollback_table_setup;
}
ib_dev->cache.roce_gid_table = table;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 05dcfad..1f918b0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -75,6 +75,7 @@ struct ib_roce_gid_table_entry {
union ib_gid gid;
struct ib_gid_attr attr;
void *context;
+ bool default_gid;
};
struct ib_roce_gid_table {
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 06/12] net: Add info for NETDEV_CHANGEUPPER event
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (4 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 05/12] IB/core: Add default GID for RoCE GID table Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 07/12] IB/core: Add RoCE table bonding support Matan Barak
` (6 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Consumers of NETDEV_CHANGEUPPER event sometimes want
to know which upper device was linked/unlinked and which
operation was carried. Adding extra information in the
notifier info block.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
include/linux/netdevice.h | 14 ++++++++++++++
net/core/dev.c | 12 ++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 05b9a69..6cd142a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3553,6 +3553,20 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
netdev_features_t features);
+enum netdev_changeupper_event {
+ NETDEV_CHANGEUPPER_LINK,
+ NETDEV_CHANGEUPPER_UNLINK,
+};
+
+struct netdev_changeupper_info {
+ struct netdev_notifier_info info; /* must be first */
+ enum netdev_changeupper_event event;
+ struct net_device *upper;
+};
+
+void netdev_changeupper_info_change(struct net_device *dev,
+ struct netdev_changeupper_info *info);
+
struct netdev_bonding_info {
ifslave slave;
ifbond master;
diff --git a/net/core/dev.c b/net/core/dev.c
index 2c1c67f..ba73be4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5198,6 +5198,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
void *private)
{
struct netdev_adjacent *i, *j, *to_i, *to_j;
+ struct netdev_changeupper_info changeupper_info;
int ret = 0;
ASSERT_RTNL();
@@ -5253,7 +5254,10 @@ static int __netdev_upper_dev_link(struct net_device *dev,
goto rollback_lower_mesh;
}
- call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+ changeupper_info.event = NETDEV_CHANGEUPPER_LINK;
+ changeupper_info.upper = upper_dev;
+ call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ &changeupper_info.info);
return 0;
rollback_lower_mesh:
@@ -5349,6 +5353,7 @@ void netdev_upper_dev_unlink(struct net_device *dev,
struct net_device *upper_dev)
{
struct netdev_adjacent *i, *j;
+ struct netdev_changeupper_info changeupper_info;
ASSERT_RTNL();
__netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
@@ -5370,7 +5375,10 @@ void netdev_upper_dev_unlink(struct net_device *dev,
list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
__netdev_adjacent_dev_unlink(dev, i->dev);
- call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+ changeupper_info.event = NETDEV_CHANGEUPPER_UNLINK;
+ changeupper_info.upper = upper_dev;
+ call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ &changeupper_info.info);
}
EXPORT_SYMBOL(netdev_upper_dev_unlink);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 07/12] IB/core: Add RoCE table bonding support
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (5 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 06/12] net: Add info for NETDEV_CHANGEUPPER event Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
[not found] ` <1433772735-22416-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 08/12] IB/core: ib_cache routines should use roce_gid_table when needed Matan Barak
` (5 subsequent siblings)
12 siblings, 1 reply; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Bonding is a unique behavior since when working in
active-backup mode, only the current selected slave
should occupy the default GIDs and the master's GID.
Listening to bonding events and only adding the
required GIDs to the active slave in the RoCE table
GID table.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/roce_gid_mgmt.c | 327 ++++++++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 13 --
include/net/bonding.h | 7 +
3 files changed, 314 insertions(+), 33 deletions(-)
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 6dcd1c7..b019d4e 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -37,6 +37,7 @@
/* For in6_dev_get/in6_dev_put */
#include <net/addrconf.h>
+#include <net/bonding.h>
#include <rdma/ib_cache.h>
#include <rdma/ib_addr.h>
@@ -55,16 +56,17 @@ struct update_gid_event_work {
enum gid_op_type gid_op;
};
-#define ROCE_NETDEV_CALLBACK_SZ 2
+#define ROCE_NETDEV_CALLBACK_SZ 3
struct netdev_event_work_cmd {
roce_netdev_callback cb;
roce_netdev_filter filter;
+ struct net_device *ndev;
+ struct net_device *f_ndev;
};
struct netdev_event_work {
struct work_struct work;
struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ];
- struct net_device *ndev;
};
unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port)
@@ -92,22 +94,96 @@ static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
}
}
+#define IS_NETDEV_BONDING_MASTER(ndev) \
+ (((ndev)->priv_flags & IFF_BONDING) && \
+ ((ndev)->flags & IFF_MASTER))
+
+enum bonding_slave_state {
+ BONDING_SLAVE_STATE_ACTIVE = 1UL << 0,
+ BONDING_SLAVE_STATE_INACTIVE = 1UL << 1,
+ BONDING_SLAVE_STATE_NA = 1UL << 2,
+};
+
+static enum bonding_slave_state is_eth_active_slave_of_bonding(struct net_device *idev,
+ struct net_device *upper)
+{
+ if (upper && IS_NETDEV_BONDING_MASTER(upper)) {
+ struct net_device *pdev;
+
+ rcu_read_lock();
+ pdev = bond_option_active_slave_get_rcu(netdev_priv(upper));
+ rcu_read_unlock();
+ if (pdev)
+ return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE :
+ BONDING_SLAVE_STATE_INACTIVE;
+ }
+
+ return BONDING_SLAVE_STATE_NA;
+}
+
+static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
+{
+ struct net_device *_upper = NULL;
+ struct list_head *iter;
+
+ rcu_read_lock();
+ netdev_for_each_all_upper_dev_rcu(dev, _upper, iter) {
+ if (_upper == upper)
+ break;
+ }
+
+ rcu_read_unlock();
+ return _upper == upper;
+}
+
+static int _is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie,
+ unsigned long bond_state)
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+ struct net_device *rdev;
+ int res;
+
+ if (!idev)
+ return 0;
+
+ rcu_read_lock();
+ rdev = rdma_vlan_dev_real_dev(ndev);
+ if (!rdev)
+ rdev = ndev;
+
+ res = ((is_upper_dev_rcu(idev, ndev) &&
+ (is_eth_active_slave_of_bonding(idev, rdev) &
+ bond_state)) ||
+ rdev == idev);
+
+ rcu_read_unlock();
+ return res;
+}
+
static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
struct net_device *idev, void *cookie)
{
- struct net_device *rdev;
- struct net_device *mdev;
- struct net_device *ndev = (struct net_device *)cookie;
+ return _is_eth_port_of_netdev(ib_dev, port, idev, cookie,
+ BONDING_SLAVE_STATE_ACTIVE |
+ BONDING_SLAVE_STATE_NA);
+}
+static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *mdev;
+ int res;
if (!idev)
return 0;
rcu_read_lock();
mdev = netdev_master_upper_dev_get_rcu(idev);
- rdev = rdma_vlan_dev_real_dev(ndev);
+ res = is_eth_active_slave_of_bonding(idev, mdev) ==
+ BONDING_SLAVE_STATE_INACTIVE;
rcu_read_unlock();
- return (rdev ? rdev : ndev) == (mdev ? mdev : idev);
+ return res;
}
static int pass_all_filter(struct ib_device *ib_dev, u8 port,
@@ -116,6 +192,34 @@ static int pass_all_filter(struct ib_device *ib_dev, u8 port,
return 1;
}
+static int upper_device_filter(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+
+ return idev == ndev || is_upper_dev_rcu(idev, ndev);
+}
+
+static int bonding_slaves_filter(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *rdev;
+ struct net_device *ndev = (struct net_device *)cookie;
+ int res;
+
+ rdev = rdma_vlan_dev_real_dev(ndev);
+
+ ndev = rdev ? rdev : ndev;
+ if (!idev || !IS_NETDEV_BONDING_MASTER(ndev))
+ return 0;
+
+ rcu_read_lock();
+ res = is_upper_dev_rcu(idev, ndev);
+ rcu_read_unlock();
+
+ return res;
+}
+
static void update_gid_ip(enum gid_op_type gid_op,
struct ib_device *ib_dev,
u8 port, struct net_device *ndev,
@@ -137,8 +241,16 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
{
unsigned long gid_type_mask;
- if (idev != ndev)
+ rcu_read_lock();
+ if (!idev ||
+ ((idev != ndev && !is_upper_dev_rcu(idev, ndev)) ||
+ is_eth_active_slave_of_bonding(idev,
+ netdev_master_upper_dev_get_rcu(idev)) ==
+ BONDING_SLAVE_STATE_INACTIVE)) {
+ rcu_read_unlock();
return;
+ }
+ rcu_read_unlock();
gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
@@ -146,6 +258,37 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
ROCE_GID_TABLE_DEFAULT_MODE_SET);
}
+static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
+ u8 port, struct net_device *ndev,
+ struct net_device *idev)
+{
+ struct net_device *rdev = rdma_vlan_dev_real_dev(ndev);
+
+ if (!idev)
+ return;
+
+ if (!rdev)
+ rdev = ndev;
+
+ rcu_read_lock();
+
+ if (is_upper_dev_rcu(idev, ndev) &&
+ is_eth_active_slave_of_bonding(idev, rdev) ==
+ BONDING_SLAVE_STATE_INACTIVE) {
+ unsigned long gid_type_mask;
+
+ rcu_read_unlock();
+
+ gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+ roce_gid_table_set_default_gid(ib_dev, port, idev,
+ gid_type_mask,
+ ROCE_GID_TABLE_DEFAULT_MODE_DELETE);
+ } else {
+ rcu_read_unlock();
+ }
+}
+
static void enum_netdev_ipv4_ips(struct ib_device *ib_dev,
u8 port, struct net_device *ndev)
{
@@ -221,16 +364,22 @@ static void enum_netdev_ipv6_ips(struct ib_device *ib_dev,
}
#endif
+static void _add_netdev_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev)
+{
+ enum_netdev_ipv4_ips(ib_dev, port, ndev);
+#if IS_ENABLED(CONFIG_IPV6)
+ enum_netdev_ipv6_ips(ib_dev, port, ndev);
+#endif
+}
+
static void add_netdev_ips(struct ib_device *ib_dev, u8 port,
struct net_device *idev, void *cookie)
{
struct net_device *ndev = (struct net_device *)cookie;
enum_netdev_default_gids(ib_dev, port, ndev, idev);
- enum_netdev_ipv4_ips(ib_dev, port, ndev);
-#if IS_ENABLED(CONFIG_IPV6)
- enum_netdev_ipv6_ips(ib_dev, port, ndev);
-#endif
+ _add_netdev_ips(ib_dev, port, ndev);
}
static void del_netdev_ips(struct ib_device *ib_dev, u8 port,
@@ -284,6 +433,92 @@ static void callback_for_addr_gid_device_scan(struct ib_device *device,
&parsed->gid_attr);
}
+static void handle_netdev_upper(struct ib_device *ib_dev, u8 port,
+ void *cookie,
+ void (*handle_netdev)(struct ib_device *ib_dev,
+ u8 port,
+ struct net_device *ndev))
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+ struct upper_list {
+ struct list_head list;
+ struct net_device *upper;
+ };
+ struct net_device *upper;
+ struct list_head *iter;
+ struct upper_list *upper_iter;
+ struct upper_list *upper_temp;
+ LIST_HEAD(upper_list);
+
+ rcu_read_lock();
+ netdev_for_each_all_upper_dev_rcu(ndev, upper, iter) {
+ struct upper_list *entry = kmalloc(sizeof(*entry),
+ GFP_ATOMIC);
+
+ if (!entry) {
+ pr_info("roce_gid_mgmt: couldn't allocate entry to delete ndev\n");
+ continue;
+ }
+
+ list_add_tail(&entry->list, &upper_list);
+ dev_hold(upper);
+ entry->upper = upper;
+ }
+ rcu_read_unlock();
+
+ handle_netdev(ib_dev, port, ndev);
+ list_for_each_entry_safe(upper_iter, upper_temp, &upper_list,
+ list) {
+ handle_netdev(ib_dev, port, upper_iter->upper);
+ dev_put(upper_iter->upper);
+ list_del(&upper_iter->list);
+ kfree(upper_iter);
+ }
+}
+
+static void _roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+ struct net_device *ndev)
+{
+ roce_del_all_netdev_gids(ib_dev, port, ndev);
+}
+
+static void del_netdev_upper_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ handle_netdev_upper(ib_dev, port, cookie, _roce_del_all_netdev_gids);
+}
+
+static void add_netdev_upper_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ handle_netdev_upper(ib_dev, port, cookie, _add_netdev_ips);
+}
+
+static void del_netdev_default_ips_join(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *mdev;
+
+ rcu_read_lock();
+ mdev = netdev_master_upper_dev_get_rcu(idev);
+ if (mdev)
+ dev_hold(mdev);
+ rcu_read_unlock();
+
+ if (mdev) {
+ bond_delete_netdev_default_gids(ib_dev, port, mdev, idev);
+ dev_put(mdev);
+ }
+}
+
+static void del_netdev_default_ips(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+ struct net_device *ndev = (struct net_device *)cookie;
+
+ bond_delete_netdev_default_gids(ib_dev, port, ndev, idev);
+}
+
/* The following functions operate on all IB devices. netdevice_event and
* addr_event execute ib_enum_roce_ports_of_netdev through a work.
* ib_enum_roce_ports_of_netdev iterates through all IB devices, thus proper
@@ -296,11 +531,15 @@ static void netdevice_event_work_handler(struct work_struct *_work)
container_of(_work, struct netdev_event_work, work);
unsigned int i;
- for (i = 0; i < ARRAY_SIZE(work->cmds) && work->cmds[i].cb; i++)
- ib_enum_roce_ports_of_netdev(work->cmds[i].filter, work->ndev,
- work->cmds[i].cb, work->ndev);
+ for (i = 0; i < ARRAY_SIZE(work->cmds) && work->cmds[i].cb; i++) {
+ ib_enum_roce_ports_of_netdev(work->cmds[i].filter,
+ work->cmds[i].f_ndev,
+ work->cmds[i].cb,
+ work->cmds[i].ndev);
+ dev_put(work->cmds[i].ndev);
+ dev_put(work->cmds[i].f_ndev);
+ }
- dev_put(work->ndev);
kfree(work);
}
@@ -309,11 +548,24 @@ static int netdevice_event(struct notifier_block *this, unsigned long event,
{
static const struct netdev_event_work_cmd add_cmd = {
.cb = add_netdev_ips, .filter = is_eth_port_of_netdev};
+ static const struct netdev_event_work_cmd add_cmd_upper_ips = {
+ .cb = add_netdev_upper_ips, .filter = is_eth_port_of_netdev};
static const struct netdev_event_work_cmd del_cmd = {
.cb = del_netdev_ips, .filter = pass_all_filter};
+ static const struct netdev_event_work_cmd bonding_default_del_cmd_join = {
+ .cb = del_netdev_default_ips_join, .filter = is_eth_port_inactive_slave};
+ static const struct netdev_event_work_cmd bonding_default_del_cmd = {
+ .cb = del_netdev_default_ips, .filter = is_eth_port_inactive_slave};
+ static const struct netdev_event_work_cmd default_del_cmd = {
+ .cb = del_netdev_default_ips, .filter = pass_all_filter};
+ static const struct netdev_event_work_cmd bonding_event_ips_del_cmd = {
+ .cb = del_netdev_upper_ips, .filter = bonding_slaves_filter};
+ static const struct netdev_event_work_cmd upper_ips_del_cmd = {
+ .cb = del_netdev_upper_ips, .filter = upper_device_filter};
struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
struct netdev_event_work *ndev_work;
struct netdev_event_work_cmd cmds[ROCE_NETDEV_CALLBACK_SZ] = { {NULL} };
+ unsigned int i;
if (ndev->type != ARPHRD_ETHER)
return NOTIFY_DONE;
@@ -321,7 +573,8 @@ static int netdevice_event(struct notifier_block *this, unsigned long event,
switch (event) {
case NETDEV_REGISTER:
case NETDEV_UP:
- cmds[0] = add_cmd;
+ cmds[0] = bonding_default_del_cmd_join;
+ cmds[1] = add_cmd;
break;
case NETDEV_UNREGISTER:
@@ -332,9 +585,37 @@ static int netdevice_event(struct notifier_block *this, unsigned long event,
break;
case NETDEV_CHANGEADDR:
- cmds[0] = del_cmd;
+ cmds[0] = default_del_cmd;
cmds[1] = add_cmd;
break;
+
+ case NETDEV_CHANGEUPPER:
+ {
+ struct netdev_changeupper_info *changeupper_info =
+ container_of(ptr, struct netdev_changeupper_info, info);
+
+ if (changeupper_info->event ==
+ NETDEV_CHANGEUPPER_UNLINK) {
+ cmds[0] = upper_ips_del_cmd;
+ cmds[0].ndev = changeupper_info->upper;
+ cmds[1] = add_cmd;
+ } else if (changeupper_info->event ==
+ NETDEV_CHANGEUPPER_LINK) {
+ cmds[0] = bonding_default_del_cmd;
+ cmds[0].ndev = changeupper_info->upper;
+ cmds[1] = add_cmd_upper_ips;
+ cmds[1].ndev = changeupper_info->upper;
+ cmds[1].f_ndev = changeupper_info->upper;
+ }
+ }
+ break;
+
+ case NETDEV_BONDING_FAILOVER:
+ cmds[0] = bonding_event_ips_del_cmd;
+ cmds[1] = bonding_default_del_cmd_join;
+ cmds[2] = add_cmd_upper_ips;
+ break;
+
default:
return NOTIFY_DONE;
}
@@ -346,8 +627,14 @@ static int netdevice_event(struct notifier_block *this, unsigned long event,
}
memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
- ndev_work->ndev = ndev;
- dev_hold(ndev);
+ for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
+ if (!ndev_work->cmds[i].ndev)
+ ndev_work->cmds[i].ndev = ndev;
+ if (!ndev_work->cmds[i].f_ndev)
+ ndev_work->cmds[i].f_ndev = ndev;
+ dev_hold(ndev_work->cmds[i].ndev);
+ dev_hold(ndev_work->cmds[i].f_ndev);
+ }
INIT_WORK(&ndev_work->work, netdevice_event_work_handler);
queue_work(roce_gid_mgmt_wq, &ndev_work->work);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 4df2894..c4fe29a8 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -689,19 +689,6 @@ static int bond_option_mode_set(struct bonding *bond,
return 0;
}
-static struct net_device *__bond_option_active_slave_get(struct bonding *bond,
- struct slave *slave)
-{
- return bond_uses_primary(bond) && slave ? slave->dev : NULL;
-}
-
-struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond)
-{
- struct slave *slave = rcu_dereference(bond->curr_active_slave);
-
- return __bond_option_active_slave_get(bond, slave);
-}
-
static int bond_option_active_slave_set(struct bonding *bond,
const struct bond_opt_value *newval)
{
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 78ed135..81a94ed 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -307,6 +307,13 @@ static inline bool bond_uses_primary(struct bonding *bond)
return bond_mode_uses_primary(BOND_MODE(bond));
}
+static inline struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond)
+{
+ struct slave *slave = rcu_dereference(bond->curr_active_slave);
+
+ return bond_uses_primary(bond) && slave ? slave->dev : NULL;
+}
+
static inline bool bond_slave_is_up(struct slave *slave)
{
return netif_running(slave->dev) && netif_carrier_ok(slave->dev);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 08/12] IB/core: ib_cache routines should use roce_gid_table when needed
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (6 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 07/12] IB/core: Add RoCE table bonding support Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 09/12] net/mlx4: Postpone the registration of net_device Matan Barak
` (4 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
When a port uses roce_gid_table, the following function
(a) ib_find_cached_gid
(b) ib_get_cached_gid
should query the gid table accordingly.
In order to query it, roce_gid_table is initialized
when needed.
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/core/cache.c | 210 +++++++++++++++++++++++++++++----------
drivers/infiniband/core/device.c | 17 +++-
include/rdma/ib_verbs.h | 21 +++-
3 files changed, 193 insertions(+), 55 deletions(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 871da83..217e639 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -58,64 +58,133 @@ struct ib_update_work {
u8 port_num;
};
-int ib_get_cached_gid(struct ib_device *device,
- u8 port_num,
- int index,
- union ib_gid *gid)
+static int __ib_get_cached_gid(struct ib_device *device,
+ u8 port_num,
+ int index,
+ union ib_gid *gid)
{
struct ib_gid_cache *cache;
unsigned long flags;
- int ret = 0;
+ int ret = -ENOENT;
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
+ if (!device->cache.gid_cache)
+ return -ENOENT;
read_lock_irqsave(&device->cache.lock, flags);
cache = device->cache.gid_cache[port_num - rdma_start_port(device)];
-
- if (index < 0 || index >= cache->table_len)
- ret = -EINVAL;
- else
+ if (cache && index >= 0 && index < cache->table_len) {
*gid = cache->table[index];
+ ret = 0;
+ }
read_unlock_irqrestore(&device->cache.lock, flags);
-
return ret;
}
+
+int ib_get_cached_gid(struct ib_device *device,
+ u8 port_num,
+ int index,
+ union ib_gid *gid)
+{
+ if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
+ return -EINVAL;
+
+ if (rdma_cap_roce_gid_table(device, port_num))
+ return roce_gid_table_get_gid(device, port_num, index, gid,
+ NULL);
+
+ if (rdma_protocol_roce(device, port_num))
+ return -EAGAIN;
+
+ return __ib_get_cached_gid(device, port_num, index, gid);
+}
EXPORT_SYMBOL(ib_get_cached_gid);
-int ib_find_cached_gid(struct ib_device *device,
- const union ib_gid *gid,
- u8 *port_num,
- u16 *index)
+static int ___ib_find_cached_gid_by_port(struct ib_device *device,
+ u8 port_num,
+ const union ib_gid *gid,
+ u16 *index)
{
struct ib_gid_cache *cache;
+ u8 p = port_num - rdma_start_port(device);
+ int i;
+
+ if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
+ return -EINVAL;
+ if (rdma_cap_roce_gid_table(device, port_num))
+ return -EPROTONOSUPPORT;
+ if (!device->cache.gid_cache)
+ return -ENOENT;
+
+ cache = device->cache.gid_cache[p];
+ if (!cache)
+ return -ENOENT;
+
+ for (i = 0; i < cache->table_len; ++i) {
+ if (!memcmp(gid, &cache->table[i], sizeof(*gid))) {
+ if (index)
+ *index = i;
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+static int __ib_find_cached_gid(struct ib_device *device,
+ const union ib_gid *gid,
+ u8 *port_num,
+ u16 *index)
+{
unsigned long flags;
- int p, i;
+ u16 found_index;
+ int p;
int ret = -ENOENT;
- *port_num = -1;
+ if (port_num)
+ *port_num = -1;
if (index)
*index = -1;
read_lock_irqsave(&device->cache.lock, flags);
- for (p = 0; p <= rdma_end_port(device) - rdma_start_port(device); ++p) {
- cache = device->cache.gid_cache[p];
- for (i = 0; i < cache->table_len; ++i) {
- if (!memcmp(gid, &cache->table[i], sizeof *gid)) {
- *port_num = p + rdma_start_port(device);
- if (index)
- *index = i;
- ret = 0;
- goto found;
- }
+ for (p = rdma_start_port(device); p <= rdma_end_port(device); ++p) {
+ if (!___ib_find_cached_gid_by_port(device, p, gid,
+ &found_index)) {
+ if (port_num)
+ *port_num = p;
+ ret = 0;
+ break;
}
}
-found:
+
read_unlock_irqrestore(&device->cache.lock, flags);
+ if (!ret && index)
+ *index = found_index;
+
+ return ret;
+}
+
+int ib_find_cached_gid(struct ib_device *device,
+ const union ib_gid *gid,
+ u8 *port_num,
+ u16 *index)
+{
+ int ret = -ENOENT;
+
+ /* Look for a RoCE device with the specified GID. */
+ if (device->cache.roce_gid_table)
+ ret = roce_gid_table_find_gid(device, gid, NULL,
+ port_num, index);
+
+ /* If no RoCE devices with the specified GID, look for IB device. */
+ if (ret)
+ ret = __ib_find_cached_gid(device, gid, port_num, index);
+
return ret;
}
EXPORT_SYMBOL(ib_find_cached_gid);
@@ -127,22 +196,23 @@ int ib_get_cached_pkey(struct ib_device *device,
{
struct ib_pkey_cache *cache;
unsigned long flags;
- int ret = 0;
+ int ret = -ENOENT;
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
+ if (!device->cache.pkey_cache)
+ return -ENOENT;
+
read_lock_irqsave(&device->cache.lock, flags);
cache = device->cache.pkey_cache[port_num - rdma_start_port(device)];
-
- if (index < 0 || index >= cache->table_len)
- ret = -EINVAL;
- else
+ if (cache && index >= 0 && index < cache->table_len) {
*pkey = cache->table[index];
+ ret = 0;
+ }
read_unlock_irqrestore(&device->cache.lock, flags);
-
return ret;
}
EXPORT_SYMBOL(ib_get_cached_pkey);
@@ -161,9 +231,14 @@ int ib_find_cached_pkey(struct ib_device *device,
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
+ if (!device->cache.pkey_cache)
+ return -ENOENT;
+
read_lock_irqsave(&device->cache.lock, flags);
cache = device->cache.pkey_cache[port_num - rdma_start_port(device)];
+ if (!cache)
+ goto out;
*index = -1;
@@ -182,8 +257,8 @@ int ib_find_cached_pkey(struct ib_device *device,
ret = 0;
}
+out:
read_unlock_irqrestore(&device->cache.lock, flags);
-
return ret;
}
EXPORT_SYMBOL(ib_find_cached_pkey);
@@ -201,9 +276,14 @@ int ib_find_exact_cached_pkey(struct ib_device *device,
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
+ if (!device->cache.pkey_cache)
+ return -ENOENT;
+
read_lock_irqsave(&device->cache.lock, flags);
cache = device->cache.pkey_cache[port_num - rdma_start_port(device)];
+ if (!cache)
+ goto out;
*index = -1;
@@ -213,9 +293,8 @@ int ib_find_exact_cached_pkey(struct ib_device *device,
ret = 0;
break;
}
-
+out:
read_unlock_irqrestore(&device->cache.lock, flags);
-
return ret;
}
EXPORT_SYMBOL(ib_find_exact_cached_pkey);
@@ -225,13 +304,16 @@ int ib_get_cached_lmc(struct ib_device *device,
u8 *lmc)
{
unsigned long flags;
- int ret = 0;
+ int ret = -ENOENT;
if (port_num < rdma_start_port(device) || port_num > rdma_end_port(device))
return -EINVAL;
read_lock_irqsave(&device->cache.lock, flags);
- *lmc = device->cache.lmc_cache[port_num - rdma_start_port(device)];
+ if (device->cache.lmc_cache) {
+ *lmc = device->cache.lmc_cache[port_num - rdma_start_port(device)];
+ ret = 0;
+ }
read_unlock_irqrestore(&device->cache.lock, flags);
return ret;
@@ -243,9 +325,18 @@ static void ib_cache_update(struct ib_device *device,
{
struct ib_port_attr *tprops = NULL;
struct ib_pkey_cache *pkey_cache = NULL, *old_pkey_cache;
- struct ib_gid_cache *gid_cache = NULL, *old_gid_cache;
+ struct ib_gid_cache *gid_cache = NULL, *old_gid_cache = NULL;
int i;
int ret;
+ bool use_roce_gid_table =
+ rdma_cap_roce_gid_table(device, port);
+
+ if (port < rdma_start_port(device) || port > rdma_end_port(device))
+ return;
+
+ if (!(device->cache.pkey_cache && device->cache.gid_cache &&
+ device->cache.lmc_cache))
+ return;
tprops = kmalloc(sizeof *tprops, GFP_KERNEL);
if (!tprops)
@@ -265,12 +356,14 @@ static void ib_cache_update(struct ib_device *device,
pkey_cache->table_len = tprops->pkey_tbl_len;
- gid_cache = kmalloc(sizeof *gid_cache + tprops->gid_tbl_len *
- sizeof *gid_cache->table, GFP_KERNEL);
- if (!gid_cache)
- goto err;
+ if (!use_roce_gid_table) {
+ gid_cache = kmalloc(sizeof(*gid_cache) + tprops->gid_tbl_len *
+ sizeof(*gid_cache->table), GFP_KERNEL);
+ if (!gid_cache)
+ goto err;
- gid_cache->table_len = tprops->gid_tbl_len;
+ gid_cache->table_len = tprops->gid_tbl_len;
+ }
for (i = 0; i < pkey_cache->table_len; ++i) {
ret = ib_query_pkey(device, port, i, pkey_cache->table + i);
@@ -281,22 +374,28 @@ static void ib_cache_update(struct ib_device *device,
}
}
- for (i = 0; i < gid_cache->table_len; ++i) {
- ret = ib_query_gid(device, port, i, gid_cache->table + i);
- if (ret) {
- printk(KERN_WARNING "ib_query_gid failed (%d) for %s (index %d)\n",
- ret, device->name, i);
- goto err;
+ if (!use_roce_gid_table) {
+ for (i = 0; i < gid_cache->table_len; ++i) {
+ ret = ib_query_gid(device, port, i,
+ gid_cache->table + i);
+ if (ret) {
+ printk(KERN_WARNING "ib_query_gid failed (%d) for %s (index %d)\n",
+ ret, device->name, i);
+ goto err;
+ }
}
}
write_lock_irq(&device->cache.lock);
old_pkey_cache = device->cache.pkey_cache[port - rdma_start_port(device)];
- old_gid_cache = device->cache.gid_cache [port - rdma_start_port(device)];
+ if (!use_roce_gid_table)
+ old_gid_cache =
+ device->cache.gid_cache[port - rdma_start_port(device)];
device->cache.pkey_cache[port - rdma_start_port(device)] = pkey_cache;
- device->cache.gid_cache [port - rdma_start_port(device)] = gid_cache;
+ if (!use_roce_gid_table)
+ device->cache.gid_cache[port - rdma_start_port(device)] = gid_cache;
device->cache.lmc_cache[port - rdma_start_port(device)] = tprops->lmc;
@@ -392,12 +491,19 @@ err:
kfree(device->cache.pkey_cache);
kfree(device->cache.gid_cache);
kfree(device->cache.lmc_cache);
+ device->cache.pkey_cache = NULL;
+ device->cache.gid_cache = NULL;
+ device->cache.lmc_cache = NULL;
}
static void ib_cache_cleanup_one(struct ib_device *device)
{
int p;
+ if (!(device->cache.pkey_cache && device->cache.gid_cache &&
+ device->cache.lmc_cache))
+ return;
+
ib_unregister_event_handler(&device->cache.event_handler);
flush_workqueue(ib_wq);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 84edb9a..f9c6935 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -40,6 +40,7 @@
#include <linux/mutex.h>
#include <rdma/rdma_netlink.h>
#include <rdma/ib_addr.h>
+#include <rdma/ib_cache.h>
#include "core_priv.h"
@@ -593,6 +594,10 @@ EXPORT_SYMBOL(ib_query_port);
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid)
{
+ if (rdma_cap_roce_gid_table(device, port_num))
+ return roce_gid_table_get_gid(device, port_num, index, gid,
+ NULL);
+
return device->query_gid(device, port_num, index, gid);
}
EXPORT_SYMBOL(ib_query_gid);
@@ -738,18 +743,26 @@ EXPORT_SYMBOL(ib_modify_port);
* a specified GID value occurs.
* @device: The device to query.
* @gid: The GID value to search for.
+ * @ndev: In RoCE, the net device of the device. Null means ignore.
* @port_num: The port number of the device where the GID value was found.
* @index: The index into the GID table where the GID was found. This
* parameter may be NULL.
*/
int ib_find_gid(struct ib_device *device, union ib_gid *gid,
- u8 *port_num, u16 *index)
+ struct net_device *ndev, u8 *port_num, u16 *index)
{
union ib_gid tmp_gid;
int ret, port, i;
+ if (device->cache.roce_gid_table &&
+ !roce_gid_table_find_gid(device, gid, ndev, port_num, index))
+ return 0;
+
for (port = rdma_start_port(device); port <= rdma_end_port(device); ++port) {
- for (i = 0; i < device->port_immutable[port].gid_tbl_len; ++i) {
+ if (rdma_cap_roce_gid_table(device, port))
+ continue;
+
+ for (i = 0; i < device->port_immutable[port].pkey_tbl_len; ++i) {
ret = ib_query_gid(device, port, i, &tmp_gid);
if (ret)
return ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1f918b0..4806d8b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2093,6 +2093,25 @@ static inline bool rdma_cap_read_multi_sge(struct ib_device *device,
return !(device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_IWARP);
}
+/**
+ * rdma_cap_roce_gid_table - Check if the port of device uses roce_gid_table
+ * @device: Device to check
+ * @port_num: Port number to check
+ *
+ * RoCE GID table mechanism manages the various GIDs for a device.
+ *
+ * NOTE: if allocating the port's GID table has failed, this call will still
+ * return true, but any RoCE GID table API will fail.
+ *
+ * Return: true if the port uses RoCE GID table mechanism in order to manage
+ * its GIDs.
+ */
+static inline bool rdma_cap_roce_gid_table(const struct ib_device *device,
+ u8 port_num)
+{
+ return rdma_protocol_roce(device, port_num) && device->cache.roce_gid_table;
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);
@@ -2108,7 +2127,7 @@ int ib_modify_port(struct ib_device *device,
struct ib_port_modify *port_modify);
int ib_find_gid(struct ib_device *device, union ib_gid *gid,
- u8 *port_num, u16 *index);
+ struct net_device *ndev, u8 *port_num, u16 *index);
int ib_find_pkey(struct ib_device *device,
u8 port_num, u16 pkey, u16 *index);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 09/12] net/mlx4: Postpone the registration of net_device
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (7 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 08/12] IB/core: ib_cache routines should use roce_gid_table when needed Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 10/12] IB/mlx4: Implement ib_device callbacks Matan Barak
` (3 subsequent siblings)
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 ++++++++++++++++------------
drivers/net/ethernet/mellanox/mlx4/intf.c | 3 +++
include/linux/mlx4/driver.h | 1 +
3 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 913b716..a946e4b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -224,6 +224,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr)
kfree(mdev);
}
+static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx)
+{
+ int i;
+ struct mlx4_en_dev *mdev = ctx;
+
+ /* Create a netdev for each port */
+ mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
+ mlx4_info(mdev, "Activating port:%d\n", i);
+ if (mlx4_en_init_netdev(mdev, i, &mdev->profile.prof[i]))
+ mdev->pndev[i] = NULL;
+ }
+
+ /* register notifier */
+ mdev->nb.notifier_call = mlx4_en_netdev_event;
+ if (register_netdevice_notifier(&mdev->nb)) {
+ mdev->nb.notifier_call = NULL;
+ mlx4_err(mdev, "Failed to create notifier\n");
+ }
+}
+
static void *mlx4_en_add(struct mlx4_dev *dev)
{
struct mlx4_en_dev *mdev;
@@ -297,21 +317,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
mutex_init(&mdev->state_lock);
mdev->device_up = true;
- /* Setup ports */
-
- /* Create a netdev for each port */
- mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
- mlx4_info(mdev, "Activating port:%d\n", i);
- if (mlx4_en_init_netdev(mdev, i, &mdev->profile.prof[i]))
- mdev->pndev[i] = NULL;
- }
- /* register notifier */
- mdev->nb.notifier_call = mlx4_en_netdev_event;
- if (register_netdevice_notifier(&mdev->nb)) {
- mdev->nb.notifier_call = NULL;
- mlx4_err(mdev, "Failed to create notifier\n");
- }
-
return mdev;
err_mr:
@@ -335,6 +340,7 @@ static struct mlx4_interface mlx4_en_interface = {
.event = mlx4_en_event,
.get_dev = mlx4_en_get_netdev,
.protocol = MLX4_PROT_ETH,
+ .activate = mlx4_en_activate,
};
static void mlx4_en_verify_params(void)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 6fce587..09e94c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, struct mlx4_priv *priv)
spin_lock_irq(&priv->ctx_lock);
list_add_tail(&dev_ctx->list, &priv->ctx_list);
spin_unlock_irq(&priv->ctx_lock);
+ if (intf->activate)
+ intf->activate(&priv->dev, dev_ctx->context);
} else
kfree(dev_ctx);
+
}
static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv *priv)
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 9553a73..5a06d96 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -59,6 +59,7 @@ struct mlx4_interface {
void (*event) (struct mlx4_dev *dev, void *context,
enum mlx4_dev_event event, unsigned long param);
void * (*get_dev)(struct mlx4_dev *dev, void *context, u8 port);
+ void (*activate)(struct mlx4_dev *dev, void *context);
struct list_head list;
enum mlx4_protocol protocol;
int flags;
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 10/12] IB/mlx4: Implement ib_device callbacks
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (8 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 09/12] net/mlx4: Postpone the registration of net_device Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
[not found] ` <1433772735-22416-11-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 14:12 ` [PATCH for-next V5 11/12] IB/mlx4: Replace mechanism for RoCE GID management Matan Barak
` (2 subsequent siblings)
12 siblings, 1 reply; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.
modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/hw/mlx4/main.c | 213 ++++++++++++++++++++++++++++++++++-
drivers/infiniband/hw/mlx4/mlx4_ib.h | 17 +++
include/linux/mlx4/device.h | 3 +-
3 files changed, 229 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 69ae464..bf38e32 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -45,6 +45,9 @@
#include <rdma/ib_smi.h>
#include <rdma/ib_user_verbs.h>
#include <rdma/ib_addr.h>
+#include <rdma/ib_cache.h>
+
+#include <net/bonding.h>
#include <linux/mlx4/driver.h>
#include <linux/mlx4/cmd.h>
@@ -129,6 +132,199 @@ static int num_ib_ports(struct mlx4_dev *dev)
return ib_ports;
}
+static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 port_num)
+{
+ struct mlx4_ib_dev *ibdev = to_mdev(device);
+
+ if (mlx4_is_bonded(ibdev->dev)) {
+ struct net_device *dev;
+ struct net_device *upper = NULL;
+
+ rcu_read_lock();
+
+ dev = mlx4_get_protocol_dev(ibdev->dev, MLX4_PROT_ETH, port_num);
+ if (dev)
+ upper = netdev_master_upper_dev_get_rcu(dev);
+ else
+ goto unlock;
+ if (upper)
+ dev = bond_option_active_slave_get_rcu(netdev_priv(upper));
+unlock:
+ rcu_read_unlock();
+
+ return dev;
+ }
+
+ return mlx4_get_protocol_dev(ibdev->dev, MLX4_PROT_ETH, port_num);
+}
+
+static int mlx4_ib_update_gids(struct gid_entry *gids,
+ struct mlx4_ib_dev *ibdev,
+ u8 port_num)
+{
+ struct mlx4_cmd_mailbox *mailbox;
+ int err;
+ struct mlx4_dev *dev = ibdev->dev;
+ int i;
+ union ib_gid *gid_tbl;
+
+ mailbox = mlx4_alloc_cmd_mailbox(dev);
+ if (IS_ERR(mailbox))
+ return -ENOMEM;
+
+ gid_tbl = mailbox->buf;
+
+ for (i = 0; i < MLX4_MAX_PORT_GIDS; ++i)
+ memcpy(&gid_tbl[i], &gids[i].gid, sizeof(union ib_gid));
+
+ err = mlx4_cmd(dev, mailbox->dma,
+ MLX4_SET_PORT_GID_TABLE << 8 | port_num,
+ 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+ MLX4_CMD_WRAPPED);
+ if (mlx4_is_bonded(dev))
+ err += mlx4_cmd(dev, mailbox->dma,
+ MLX4_SET_PORT_GID_TABLE << 8 | 2,
+ 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+ MLX4_CMD_WRAPPED);
+
+ mlx4_free_cmd_mailbox(dev, mailbox);
+ return err;
+}
+
+static int mlx4_ib_modify_gid(struct ib_device *device,
+ u8 port_num, unsigned int index,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr,
+ void **context)
+{
+ struct mlx4_ib_dev *ibdev = to_mdev(device);
+ struct mlx4_ib_iboe *iboe = &ibdev->iboe;
+ struct mlx4_port_gid_table *port_gid_table;
+ int free = -1, found = -1;
+ int ret = 0;
+ int clear = !memcmp(&zgid, gid, sizeof(*gid));
+ int hw_update = 0;
+ int i;
+ struct gid_entry *gids = NULL;
+
+ if (!rdma_cap_roce_gid_table(device, port_num))
+ return -EINVAL;
+
+ if (port_num > MLX4_MAX_PORTS)
+ return -EINVAL;
+
+ if (!context)
+ return -EINVAL;
+
+ spin_lock_bh(&iboe->lock);
+ port_gid_table = &iboe->gids[port_num - 1];
+
+ if (clear) {
+ struct gid_cache_context *ctx = *context;
+
+ if (ctx) {
+ ctx->refcount--;
+ if (!ctx->refcount) {
+ unsigned int real_index = ctx->real_index;
+
+ memcpy(&port_gid_table->gids[real_index].gid, &zgid, sizeof(*gid));
+ kfree(port_gid_table->gids[real_index].ctx);
+ port_gid_table->gids[real_index].ctx = NULL;
+ hw_update = 1;
+ }
+ }
+ } else {
+ for (i = 0; i < MLX4_MAX_PORT_GIDS; ++i) {
+ if (!memcmp(&port_gid_table->gids[i].gid, gid, sizeof(*gid))) {
+ found = i;
+ break;
+ }
+ if (free < 0 && !memcmp(&port_gid_table->gids[i].gid, &zgid, sizeof(*gid)))
+ free = i; /* HW has space */
+ }
+
+ if (found < 0) {
+ if (free < 0) {
+ ret = -ENOSPC;
+ } else {
+ port_gid_table->gids[free].ctx = kmalloc(sizeof(*port_gid_table->gids[free].ctx), GFP_ATOMIC);
+ if (!port_gid_table->gids[free].ctx) {
+ ret = -ENOMEM;
+ } else {
+ *context = port_gid_table->gids[free].ctx;
+ memcpy(&port_gid_table->gids[free].gid, gid, sizeof(*gid));
+ port_gid_table->gids[free].ctx->real_index = free;
+ port_gid_table->gids[free].ctx->refcount = 1;
+ hw_update = 1;
+ }
+ }
+ } else {
+ struct gid_cache_context *ctx = port_gid_table->gids[found].ctx;
+ *context = ctx;
+ ctx->refcount++;
+ }
+ }
+ if (!ret && hw_update) {
+ gids = kmalloc(sizeof(*gids) * MLX4_MAX_PORT_GIDS, GFP_ATOMIC);
+ if (!gids) {
+ ret = -ENOMEM;
+ } else {
+ for (i = 0; i < MLX4_MAX_PORT_GIDS; i++)
+ memcpy(&gids[i].gid, &port_gid_table->gids[i].gid, sizeof(union ib_gid));
+ }
+ }
+ spin_unlock_bh(&iboe->lock);
+
+ if (!ret && hw_update) {
+ ret = mlx4_ib_update_gids(gids, ibdev, port_num);
+ kfree(gids);
+ }
+
+ return ret;
+}
+
+int mlx4_ib_gid_index_to_real_index(struct mlx4_ib_dev *ibdev,
+ u8 port_num, int index)
+{
+ struct mlx4_ib_iboe *iboe = &ibdev->iboe;
+ struct gid_cache_context *ctx = NULL;
+ union ib_gid gid;
+ struct mlx4_port_gid_table *port_gid_table;
+ int real_index = -EINVAL;
+ int i;
+ int ret;
+ unsigned long flags;
+
+ if (port_num > MLX4_MAX_PORTS)
+ return -EINVAL;
+
+ if (mlx4_is_bonded(ibdev->dev))
+ port_num = 1;
+
+ if (!rdma_cap_roce_gid_table(&ibdev->ib_dev, port_num))
+ return index;
+
+ ret = ib_get_cached_gid(&ibdev->ib_dev, port_num, index, &gid);
+ if (ret)
+ return ret;
+
+ if (!memcmp(&gid, &zgid, sizeof(gid)))
+ return -EINVAL;
+
+ spin_lock_irqsave(&iboe->lock, flags);
+ port_gid_table = &iboe->gids[port_num - 1];
+
+ for (i = 0; i < MLX4_MAX_PORT_GIDS; ++i)
+ if (!memcmp(&port_gid_table->gids[i].gid, &gid, sizeof(gid))) {
+ ctx = port_gid_table->gids[i].ctx;
+ break;
+ }
+ if (ctx)
+ real_index = ctx->real_index;
+ spin_unlock_irqrestore(&iboe->lock, flags);
+ return real_index;
+}
+
static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
{
@@ -477,11 +673,22 @@ out:
static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid)
{
- struct mlx4_ib_dev *dev = to_mdev(ibdev);
+ int ret;
- *gid = dev->iboe.gid_table[port - 1][index];
+ if (!rdma_cap_roce_gid_table(ibdev, port)) {
+ struct mlx4_ib_dev *dev = to_mdev(ibdev);
- return 0;
+ *gid = dev->iboe.gid_table[port - 1][index];
+ return 0;
+ }
+
+ ret = ib_get_cached_gid(ibdev, port, index, gid);
+ if (ret == -EAGAIN) {
+ memcpy(gid, &zgid, sizeof(*gid));
+ return 0;
+ }
+
+ return ret;
}
static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 645d55e..c870ddb 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -456,6 +456,20 @@ struct mlx4_ib_sriov {
struct idr pv_id_table;
};
+struct gid_cache_context {
+ int real_index;
+ int refcount;
+};
+
+struct gid_entry {
+ union ib_gid gid;
+ struct gid_cache_context *ctx;
+};
+
+struct mlx4_port_gid_table {
+ struct gid_entry gids[MLX4_MAX_PORT_GIDS];
+};
+
struct mlx4_ib_iboe {
spinlock_t lock;
struct net_device *netdevs[MLX4_MAX_PORTS];
@@ -465,6 +479,7 @@ struct mlx4_ib_iboe {
struct notifier_block nb_inet;
struct notifier_block nb_inet6;
union ib_gid gid_table[MLX4_MAX_PORTS][128];
+ struct mlx4_port_gid_table gids[MLX4_MAX_PORTS];
};
struct pkey_mgt {
@@ -815,5 +830,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
u64 start, u64 length, u64 virt_addr,
int mr_access_flags, struct ib_pd *pd,
struct ib_udata *udata);
+int mlx4_ib_gid_index_to_real_index(struct mlx4_ib_dev *ibdev,
+ u8 port_num, int index);
#endif /* MLX4_IB_H */
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 83e80ab..d439949 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -78,7 +78,8 @@ enum {
enum {
MLX4_MAX_PORTS = 2,
- MLX4_MAX_PORT_PKEYS = 128
+ MLX4_MAX_PORT_PKEYS = 128,
+ MLX4_MAX_PORT_GIDS = 128
};
/* base qkey for use in sriov tunnel-qp/proxy-qp communication.
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 11/12] IB/mlx4: Replace mechanism for RoCE GID management
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (9 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 10/12] IB/mlx4: Implement ib_device callbacks Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
2015-06-08 14:12 ` [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core Matan Barak
2015-06-08 21:37 ` [PATCH for-next V5 00/12] Move RoCE GID management " Hefty, Sean
12 siblings, 0 replies; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/hw/mlx4/ah.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 510 ++---------------------------------
drivers/infiniband/hw/mlx4/mlx4_ib.h | 4 -
drivers/infiniband/hw/mlx4/qp.c | 10 +-
4 files changed, 28 insertions(+), 498 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index f50a546..7ad6f96 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -89,7 +89,7 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
if (vlan_tag < 0x1000)
vlan_tag |= (ah_attr->sl & 7) << 13;
ah->av.eth.port_pd = cpu_to_be32(to_mpd(pd)->pdn | (ah_attr->port_num << 24));
- ah->av.eth.gid_index = ah_attr->grh.sgid_index;
+ ah->av.eth.gid_index = mlx4_ib_gid_index_to_real_index(ibdev, ah_attr->port_num, ah_attr->grh.sgid_index);
ah->av.eth.vlan = cpu_to_be16(vlan_tag);
if (ah_attr->static_rate) {
ah->av.eth.stat_rate = ah_attr->static_rate + MLX4_STAT_RATE_OFFSET;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bf38e32..18708a7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -77,13 +77,6 @@ static const char mlx4_ib_version[] =
DRV_NAME ": Mellanox ConnectX InfiniBand driver v"
DRV_VERSION " (" DRV_RELDATE ")\n";
-struct update_gid_work {
- struct work_struct work;
- union ib_gid gids[128];
- struct mlx4_ib_dev *dev;
- int port;
-};
-
static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init);
static struct workqueue_struct *wq;
@@ -560,7 +553,8 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port,
props->active_width = (((u8 *)mailbox->buf)[5] == 0x40) ?
IB_WIDTH_4X : IB_WIDTH_1X;
props->active_speed = IB_SPEED_QDR;
- props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS;
+ props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS |
+ IB_PORT_ROCE;
props->gid_tbl_len = mdev->dev->caps.gid_table_len[port];
props->max_msg_sz = mdev->dev->caps.max_msg_sz;
props->pkey_tbl_len = 1;
@@ -569,12 +563,13 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port,
props->state = IB_PORT_DOWN;
props->phys_state = state_to_phys_state(props->state);
props->active_mtu = IB_MTU_256;
- if (is_bonded)
- rtnl_lock(); /* required to get upper dev */
spin_lock_bh(&iboe->lock);
ndev = iboe->netdevs[port - 1];
- if (ndev && is_bonded)
- ndev = netdev_master_upper_dev_get(ndev);
+ if (ndev && is_bonded) {
+ rcu_read_lock(); /* required to get upper dev */
+ ndev = netdev_master_upper_dev_get_rcu(ndev);
+ rcu_read_unlock();
+ }
if (!ndev)
goto out_unlock;
@@ -586,8 +581,6 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port,
props->phys_state = state_to_phys_state(props->state);
out_unlock:
spin_unlock_bh(&iboe->lock);
- if (is_bonded)
- rtnl_unlock();
out:
mlx4_free_cmd_mailbox(mdev->dev, mailbox);
return err;
@@ -670,17 +663,19 @@ out:
return err;
}
-static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
- union ib_gid *gid)
+static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
+ union ib_gid *gid)
{
int ret;
- if (!rdma_cap_roce_gid_table(ibdev, port)) {
- struct mlx4_ib_dev *dev = to_mdev(ibdev);
+ if (rdma_protocol_ib(ibdev, port))
+ return __mlx4_ib_query_gid(ibdev, port, index, gid, 0);
- *gid = dev->iboe.gid_table[port - 1][index];
- return 0;
- }
+ if (!rdma_protocol_roce(ibdev, port))
+ return -ENODEV;
+
+ if (!rdma_cap_roce_gid_table(ibdev, port))
+ return -ENODEV;
ret = ib_get_cached_gid(ibdev, port, index, gid);
if (ret == -EAGAIN) {
@@ -691,15 +686,6 @@ static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
return ret;
}
-static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
- union ib_gid *gid)
-{
- if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND)
- return __mlx4_ib_query_gid(ibdev, port, index, gid, 0);
- else
- return iboe_query_gid(ibdev, port, index, gid);
-}
-
int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
u16 *pkey, int netw_view)
{
@@ -1695,272 +1681,6 @@ static struct device_attribute *mlx4_class_attributes[] = {
&dev_attr_board_id
};
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
- struct net_device *dev)
-{
- memcpy(eui, dev->dev_addr, 3);
- memcpy(eui + 5, dev->dev_addr + 3, 3);
- if (vlan_id < 0x1000) {
- eui[3] = vlan_id >> 8;
- eui[4] = vlan_id & 0xff;
- } else {
- eui[3] = 0xff;
- eui[4] = 0xfe;
- }
- eui[0] ^= 2;
-}
-
-static void update_gids_task(struct work_struct *work)
-{
- struct update_gid_work *gw = container_of(work, struct update_gid_work, work);
- struct mlx4_cmd_mailbox *mailbox;
- union ib_gid *gids;
- int err;
- struct mlx4_dev *dev = gw->dev->dev;
- int is_bonded = mlx4_is_bonded(dev);
-
- if (!gw->dev->ib_active)
- return;
-
- mailbox = mlx4_alloc_cmd_mailbox(dev);
- if (IS_ERR(mailbox)) {
- pr_warn("update gid table failed %ld\n", PTR_ERR(mailbox));
- return;
- }
-
- gids = mailbox->buf;
- memcpy(gids, gw->gids, sizeof gw->gids);
-
- err = mlx4_cmd(dev, mailbox->dma, MLX4_SET_PORT_GID_TABLE << 8 | gw->port,
- MLX4_SET_PORT_ETH_OPCODE, MLX4_CMD_SET_PORT,
- MLX4_CMD_TIME_CLASS_B, MLX4_CMD_WRAPPED);
- if (err)
- pr_warn("set port command failed\n");
- else
- if ((gw->port == 1) || !is_bonded)
- mlx4_ib_dispatch_event(gw->dev,
- is_bonded ? 1 : gw->port,
- IB_EVENT_GID_CHANGE);
-
- mlx4_free_cmd_mailbox(dev, mailbox);
- kfree(gw);
-}
-
-static void reset_gids_task(struct work_struct *work)
-{
- struct update_gid_work *gw =
- container_of(work, struct update_gid_work, work);
- struct mlx4_cmd_mailbox *mailbox;
- union ib_gid *gids;
- int err;
- struct mlx4_dev *dev = gw->dev->dev;
-
- if (!gw->dev->ib_active)
- return;
-
- mailbox = mlx4_alloc_cmd_mailbox(dev);
- if (IS_ERR(mailbox)) {
- pr_warn("reset gid table failed\n");
- goto free;
- }
-
- gids = mailbox->buf;
- memcpy(gids, gw->gids, sizeof(gw->gids));
-
- if (mlx4_ib_port_link_layer(&gw->dev->ib_dev, gw->port) ==
- IB_LINK_LAYER_ETHERNET) {
- err = mlx4_cmd(dev, mailbox->dma,
- MLX4_SET_PORT_GID_TABLE << 8 | gw->port,
- MLX4_SET_PORT_ETH_OPCODE, MLX4_CMD_SET_PORT,
- MLX4_CMD_TIME_CLASS_B,
- MLX4_CMD_WRAPPED);
- if (err)
- pr_warn("set port %d command failed\n", gw->port);
- }
-
- mlx4_free_cmd_mailbox(dev, mailbox);
-free:
- kfree(gw);
-}
-
-static int update_gid_table(struct mlx4_ib_dev *dev, int port,
- union ib_gid *gid, int clear,
- int default_gid)
-{
- struct update_gid_work *work;
- int i;
- int need_update = 0;
- int free = -1;
- int found = -1;
- int max_gids;
-
- if (default_gid) {
- free = 0;
- } else {
- max_gids = dev->dev->caps.gid_table_len[port];
- for (i = 1; i < max_gids; ++i) {
- if (!memcmp(&dev->iboe.gid_table[port - 1][i], gid,
- sizeof(*gid)))
- found = i;
-
- if (clear) {
- if (found >= 0) {
- need_update = 1;
- dev->iboe.gid_table[port - 1][found] =
- zgid;
- break;
- }
- } else {
- if (found >= 0)
- break;
-
- if (free < 0 &&
- !memcmp(&dev->iboe.gid_table[port - 1][i],
- &zgid, sizeof(*gid)))
- free = i;
- }
- }
- }
-
- if (found == -1 && !clear && free >= 0) {
- dev->iboe.gid_table[port - 1][free] = *gid;
- need_update = 1;
- }
-
- if (!need_update)
- return 0;
-
- work = kzalloc(sizeof(*work), GFP_ATOMIC);
- if (!work)
- return -ENOMEM;
-
- memcpy(work->gids, dev->iboe.gid_table[port - 1], sizeof(work->gids));
- INIT_WORK(&work->work, update_gids_task);
- work->port = port;
- work->dev = dev;
- queue_work(wq, &work->work);
-
- return 0;
-}
-
-static void mlx4_make_default_gid(struct net_device *dev, union ib_gid *gid)
-{
- gid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
- mlx4_addrconf_ifid_eui48(&gid->raw[8], 0xffff, dev);
-}
-
-
-static int reset_gid_table(struct mlx4_ib_dev *dev, u8 port)
-{
- struct update_gid_work *work;
-
- work = kzalloc(sizeof(*work), GFP_ATOMIC);
- if (!work)
- return -ENOMEM;
-
- memset(dev->iboe.gid_table[port - 1], 0, sizeof(work->gids));
- memset(work->gids, 0, sizeof(work->gids));
- INIT_WORK(&work->work, reset_gids_task);
- work->dev = dev;
- work->port = port;
- queue_work(wq, &work->work);
- return 0;
-}
-
-static int mlx4_ib_addr_event(int event, struct net_device *event_netdev,
- struct mlx4_ib_dev *ibdev, union ib_gid *gid)
-{
- struct mlx4_ib_iboe *iboe;
- int port = 0;
- struct net_device *real_dev = rdma_vlan_dev_real_dev(event_netdev) ?
- rdma_vlan_dev_real_dev(event_netdev) :
- event_netdev;
- union ib_gid default_gid;
-
- mlx4_make_default_gid(real_dev, &default_gid);
-
- if (!memcmp(gid, &default_gid, sizeof(*gid)))
- return 0;
-
- if (event != NETDEV_DOWN && event != NETDEV_UP)
- return 0;
-
- if ((real_dev != event_netdev) &&
- (event == NETDEV_DOWN) &&
- rdma_link_local_addr((struct in6_addr *)gid))
- return 0;
-
- iboe = &ibdev->iboe;
- spin_lock_bh(&iboe->lock);
-
- for (port = 1; port <= ibdev->dev->caps.num_ports; ++port)
- if ((netif_is_bond_master(real_dev) &&
- (real_dev == iboe->masters[port - 1])) ||
- (!netif_is_bond_master(real_dev) &&
- (real_dev == iboe->netdevs[port - 1])))
- update_gid_table(ibdev, port, gid,
- event == NETDEV_DOWN, 0);
-
- spin_unlock_bh(&iboe->lock);
- return 0;
-
-}
-
-static u8 mlx4_ib_get_dev_port(struct net_device *dev,
- struct mlx4_ib_dev *ibdev)
-{
- u8 port = 0;
- struct mlx4_ib_iboe *iboe;
- struct net_device *real_dev = rdma_vlan_dev_real_dev(dev) ?
- rdma_vlan_dev_real_dev(dev) : dev;
-
- iboe = &ibdev->iboe;
-
- for (port = 1; port <= ibdev->dev->caps.num_ports; ++port)
- if ((netif_is_bond_master(real_dev) &&
- (real_dev == iboe->masters[port - 1])) ||
- (!netif_is_bond_master(real_dev) &&
- (real_dev == iboe->netdevs[port - 1])))
- break;
-
- if ((port == 0) || (port > ibdev->dev->caps.num_ports))
- return 0;
- else
- return port;
-}
-
-static int mlx4_ib_inet_event(struct notifier_block *this, unsigned long event,
- void *ptr)
-{
- struct mlx4_ib_dev *ibdev;
- struct in_ifaddr *ifa = ptr;
- union ib_gid gid;
- struct net_device *event_netdev = ifa->ifa_dev->dev;
-
- ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
-
- ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet);
-
- mlx4_ib_addr_event(event, event_netdev, ibdev, &gid);
- return NOTIFY_DONE;
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static int mlx4_ib_inet6_event(struct notifier_block *this, unsigned long event,
- void *ptr)
-{
- struct mlx4_ib_dev *ibdev;
- struct inet6_ifaddr *ifa = ptr;
- union ib_gid *gid = (union ib_gid *)&ifa->addr;
- struct net_device *event_netdev = ifa->idev->dev;
-
- ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet6);
-
- mlx4_ib_addr_event(event, event_netdev, ibdev, gid);
- return NOTIFY_DONE;
-}
-#endif
-
#define MLX4_IB_INVALID_MAC ((u64)-1)
static void mlx4_ib_update_qps(struct mlx4_ib_dev *ibdev,
struct net_device *dev,
@@ -2019,94 +1739,6 @@ unlock:
mutex_unlock(&ibdev->qp1_proxy_lock[port - 1]);
}
-static void mlx4_ib_get_dev_addr(struct net_device *dev,
- struct mlx4_ib_dev *ibdev, u8 port)
-{
- struct in_device *in_dev;
-#if IS_ENABLED(CONFIG_IPV6)
- struct inet6_dev *in6_dev;
- union ib_gid *pgid;
- struct inet6_ifaddr *ifp;
- union ib_gid default_gid;
-#endif
- union ib_gid gid;
-
-
- if ((port == 0) || (port > ibdev->dev->caps.num_ports))
- return;
-
- /* IPv4 gids */
- in_dev = in_dev_get(dev);
- if (in_dev) {
- for_ifa(in_dev) {
- /*ifa->ifa_address;*/
- ipv6_addr_set_v4mapped(ifa->ifa_address,
- (struct in6_addr *)&gid);
- update_gid_table(ibdev, port, &gid, 0, 0);
- }
- endfor_ifa(in_dev);
- in_dev_put(in_dev);
- }
-#if IS_ENABLED(CONFIG_IPV6)
- mlx4_make_default_gid(dev, &default_gid);
- /* IPv6 gids */
- in6_dev = in6_dev_get(dev);
- if (in6_dev) {
- read_lock_bh(&in6_dev->lock);
- list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
- pgid = (union ib_gid *)&ifp->addr;
- if (!memcmp(pgid, &default_gid, sizeof(*pgid)))
- continue;
- update_gid_table(ibdev, port, pgid, 0, 0);
- }
- read_unlock_bh(&in6_dev->lock);
- in6_dev_put(in6_dev);
- }
-#endif
-}
-
-static void mlx4_ib_set_default_gid(struct mlx4_ib_dev *ibdev,
- struct net_device *dev, u8 port)
-{
- union ib_gid gid;
- mlx4_make_default_gid(dev, &gid);
- update_gid_table(ibdev, port, &gid, 0, 1);
-}
-
-static int mlx4_ib_init_gid_table(struct mlx4_ib_dev *ibdev)
-{
- struct net_device *dev;
- struct mlx4_ib_iboe *iboe = &ibdev->iboe;
- int i;
- int err = 0;
-
- for (i = 1; i <= ibdev->num_ports; ++i) {
- if (rdma_port_get_link_layer(&ibdev->ib_dev, i) ==
- IB_LINK_LAYER_ETHERNET) {
- err = reset_gid_table(ibdev, i);
- if (err)
- goto out;
- }
- }
-
- read_lock(&dev_base_lock);
- spin_lock_bh(&iboe->lock);
-
- for_each_netdev(&init_net, dev) {
- u8 port = mlx4_ib_get_dev_port(dev, ibdev);
- /* port will be non-zero only for ETH ports */
- if (port) {
- mlx4_ib_set_default_gid(ibdev, dev, port);
- mlx4_ib_get_dev_addr(dev, ibdev, port);
- }
- }
-
- spin_unlock_bh(&iboe->lock);
- read_unlock(&dev_base_lock);
-out:
- return err;
-}
-
static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev,
struct net_device *dev,
unsigned long event)
@@ -2116,81 +1748,22 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev,
int update_qps_port = -1;
int port;
+ ASSERT_RTNL();
+
iboe = &ibdev->iboe;
spin_lock_bh(&iboe->lock);
mlx4_foreach_ib_transport_port(port, ibdev->dev) {
- enum ib_port_state port_state = IB_PORT_NOP;
- struct net_device *old_master = iboe->masters[port - 1];
- struct net_device *curr_netdev;
- struct net_device *curr_master;
iboe->netdevs[port - 1] =
mlx4_get_protocol_dev(ibdev->dev, MLX4_PROT_ETH, port);
- if (iboe->netdevs[port - 1])
- mlx4_ib_set_default_gid(ibdev,
- iboe->netdevs[port - 1], port);
- curr_netdev = iboe->netdevs[port - 1];
-
- if (iboe->netdevs[port - 1] &&
- netif_is_bond_slave(iboe->netdevs[port - 1])) {
- iboe->masters[port - 1] = netdev_master_upper_dev_get(
- iboe->netdevs[port - 1]);
- } else {
- iboe->masters[port - 1] = NULL;
- }
- curr_master = iboe->masters[port - 1];
if (dev == iboe->netdevs[port - 1] &&
(event == NETDEV_CHANGEADDR || event == NETDEV_REGISTER ||
event == NETDEV_UP || event == NETDEV_CHANGE))
update_qps_port = port;
- if (curr_netdev) {
- port_state = (netif_running(curr_netdev) && netif_carrier_ok(curr_netdev)) ?
- IB_PORT_ACTIVE : IB_PORT_DOWN;
- mlx4_ib_set_default_gid(ibdev, curr_netdev, port);
- if (curr_master) {
- /* if using bonding/team and a slave port is down, we
- * don't want the bond IP based gids in the table since
- * flows that select port by gid may get the down port.
- */
- if (port_state == IB_PORT_DOWN &&
- !mlx4_is_bonded(ibdev->dev)) {
- reset_gid_table(ibdev, port);
- mlx4_ib_set_default_gid(ibdev,
- curr_netdev,
- port);
- } else {
- /* gids from the upper dev (bond/team)
- * should appear in port's gid table
- */
- mlx4_ib_get_dev_addr(curr_master,
- ibdev, port);
- }
- }
- /* if bonding is used it is possible that we add it to
- * masters only after IP address is assigned to the
- * net bonding interface.
- */
- if (curr_master && (old_master != curr_master)) {
- reset_gid_table(ibdev, port);
- mlx4_ib_set_default_gid(ibdev,
- curr_netdev, port);
- mlx4_ib_get_dev_addr(curr_master, ibdev, port);
- }
-
- if (!curr_master && (old_master != curr_master)) {
- reset_gid_table(ibdev, port);
- mlx4_ib_set_default_gid(ibdev,
- curr_netdev, port);
- mlx4_ib_get_dev_addr(curr_netdev, ibdev, port);
- }
- } else {
- reset_gid_table(ibdev, port);
- }
}
-
spin_unlock_bh(&iboe->lock);
if (update_qps_port > 0)
@@ -2394,6 +1967,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
1 : ibdev->num_ports;
ibdev->ib_dev.num_comp_vectors = dev->caps.num_comp_vectors;
ibdev->ib_dev.dma_device = &dev->persist->pdev->dev;
+ ibdev->ib_dev.get_netdev = mlx4_ib_get_netdev;
+ ibdev->ib_dev.modify_gid = mlx4_ib_modify_gid;
if (dev->caps.userspace_caps)
ibdev->ib_dev.uverbs_abi_ver = MLX4_IB_UVERBS_ABI_VERSION;
@@ -2588,26 +2163,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
goto err_notif;
}
}
- if (!iboe->nb_inet.notifier_call) {
- iboe->nb_inet.notifier_call = mlx4_ib_inet_event;
- err = register_inetaddr_notifier(&iboe->nb_inet);
- if (err) {
- iboe->nb_inet.notifier_call = NULL;
- goto err_notif;
- }
- }
-#if IS_ENABLED(CONFIG_IPV6)
- if (!iboe->nb_inet6.notifier_call) {
- iboe->nb_inet6.notifier_call = mlx4_ib_inet6_event;
- err = register_inet6addr_notifier(&iboe->nb_inet6);
- if (err) {
- iboe->nb_inet6.notifier_call = NULL;
- goto err_notif;
- }
- }
-#endif
- if (mlx4_ib_init_gid_table(ibdev))
- goto err_notif;
}
for (j = 0; j < ARRAY_SIZE(mlx4_class_attributes); ++j) {
@@ -2638,18 +2193,6 @@ err_notif:
pr_warn("failure unregistering notifier\n");
ibdev->iboe.nb.notifier_call = NULL;
}
- if (ibdev->iboe.nb_inet.notifier_call) {
- if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
- pr_warn("failure unregistering notifier\n");
- ibdev->iboe.nb_inet.notifier_call = NULL;
- }
-#if IS_ENABLED(CONFIG_IPV6)
- if (ibdev->iboe.nb_inet6.notifier_call) {
- if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
- pr_warn("failure unregistering notifier\n");
- ibdev->iboe.nb_inet6.notifier_call = NULL;
- }
-#endif
flush_workqueue(wq);
mlx4_ib_close_sriov(ibdev);
@@ -2773,19 +2316,6 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
kfree(ibdev->ib_uc_qpns_bitmap);
}
- if (ibdev->iboe.nb_inet.notifier_call) {
- if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
- pr_warn("failure unregistering notifier\n");
- ibdev->iboe.nb_inet.notifier_call = NULL;
- }
-#if IS_ENABLED(CONFIG_IPV6)
- if (ibdev->iboe.nb_inet6.notifier_call) {
- if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
- pr_warn("failure unregistering notifier\n");
- ibdev->iboe.nb_inet6.notifier_call = NULL;
- }
-#endif
-
iounmap(ibdev->uar_map);
for (p = 0; p < ibdev->num_ports; ++p)
if (ibdev->counters[p] != -1)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index c870ddb..19ffdab 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -473,12 +473,8 @@ struct mlx4_port_gid_table {
struct mlx4_ib_iboe {
spinlock_t lock;
struct net_device *netdevs[MLX4_MAX_PORTS];
- struct net_device *masters[MLX4_MAX_PORTS];
atomic64_t mac[MLX4_MAX_PORTS];
struct notifier_block nb;
- struct notifier_block nb_inet;
- struct notifier_block nb_inet6;
- union ib_gid gid_table[MLX4_MAX_PORTS][128];
struct mlx4_port_gid_table gids[MLX4_MAX_PORTS];
};
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 02fc91c6..d4393a1 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1292,14 +1292,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
path->static_rate = 0;
if (ah->ah_flags & IB_AH_GRH) {
- if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len[port]) {
+ int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
+ port,
+ ah->grh.sgid_index);
+
+ if (real_sgid_index >= dev->dev->caps.gid_table_len[port]) {
pr_err("sgid_index (%u) too large. max is %d\n",
- ah->grh.sgid_index, dev->dev->caps.gid_table_len[port] - 1);
+ real_sgid_index, dev->dev->caps.gid_table_len[port] - 1);
return -1;
}
path->grh_mylmc |= 1 << 7;
- path->mgid_index = ah->grh.sgid_index;
+ path->mgid_index = real_sgid_index;
path->hop_limit = ah->grh.hop_limit;
path->tclass_flowlabel =
cpu_to_be32((ah->grh.traffic_class << 20) |
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (10 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 11/12] IB/mlx4: Replace mechanism for RoCE GID management Matan Barak
@ 2015-06-08 14:12 ` Matan Barak
[not found] ` <1433772735-22416-13-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-08 21:37 ` [PATCH for-next V5 00/12] Move RoCE GID management " Hefty, Sean
12 siblings, 1 reply; 45+ messages in thread
From: Matan Barak @ 2015-06-08 14:12 UTC (permalink / raw)
To: Doug Ledford
Cc: Matan Barak, Or Gerlitz, Moni Shoua, Jason Gunthorpe, Sean Hefty,
Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Somnath Kotur,
Devesh Sharma
From: Somnath Kotur <somnath.kotur-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
1.Check and set port capability flags to indicate RoCEV2 support.
2.Change query_gid hook to return value from IB/Core GID Mgmt APIs.
3.Get rid of all the netdev notifier chain subscription code as well as
maintenance of SGID Table in memory.
4.Implement get_netdev hook in driver.
Signed-off-by: Somnath Kotur <somnath.kotur-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Devesh Sharma <devesh.sharma-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/hw/ocrdma/ocrdma.h | 10 ++
drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 3 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +---------------------------
drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 13 ++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 31 +++-
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 +
6 files changed, 62 insertions(+), 232 deletions(-)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index c9780d9..ea6484c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -99,6 +99,7 @@ struct ocrdma_dev_attr {
u8 local_ca_ack_delay;
u8 ird;
u8 num_ird_pages;
+ u8 roce_flags;
};
struct ocrdma_dma_mem {
@@ -574,4 +575,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state)
(state & OCRDMA_STATE_FLAG_SYNC);
}
+static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev)
+{
+ return (dev->attr.roce_flags & (OCRDMA_L3_TYPE_IPV4 <<
+ OCRDMA_ROUDP_FLAGS_SHIFT) ||
+ dev->attr.roce_flags & (OCRDMA_L3_TYPE_IPV6 <<
+ OCRDMA_ROUDP_FLAGS_SHIFT)) ?
+ true : false;
+}
+
#endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index 0c9e959..42116a5 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev,
attr->local_ca_ack_delay = (rsp->max_pd_ca_ack_delay &
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) >>
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT;
+ attr->roce_flags = (rsp->max_pd_ca_ack_delay &
+ OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) >>
+ OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT;
attr->max_mw = rsp->max_mw;
attr->max_mr = rsp->max_mr;
attr->max_mr_size = ((u64)rsp->max_mr_size_hi << 32) |
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index f552898..0d3e915 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list);
static DEFINE_SPINLOCK(ocrdma_devlist_lock);
static DEFINE_IDR(ocrdma_dev_id);
-static union ib_gid ocrdma_zero_sgid;
-
void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
{
u8 mac_addr[6];
@@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
guid[6] = mac_addr[4];
guid[7] = mac_addr[5];
}
-
-static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
-{
- int i;
- unsigned long flags;
-
- memset(&ocrdma_zero_sgid, 0, sizeof(union ib_gid));
-
-
- spin_lock_irqsave(&dev->sgid_lock, flags);
- for (i = 0; i < OCRDMA_MAX_SGID; i++) {
- if (!memcmp(&dev->sgid_tbl[i], &ocrdma_zero_sgid,
- sizeof(union ib_gid))) {
- /* found free entry */
- memcpy(&dev->sgid_tbl[i], new_sgid,
- sizeof(union ib_gid));
- spin_unlock_irqrestore(&dev->sgid_lock, flags);
- return true;
- } else if (!memcmp(&dev->sgid_tbl[i], new_sgid,
- sizeof(union ib_gid))) {
- /* entry already present, no addition is required. */
- spin_unlock_irqrestore(&dev->sgid_lock, flags);
- return false;
- }
- }
- spin_unlock_irqrestore(&dev->sgid_lock, flags);
- return false;
-}
-
-static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
-{
- int found = false;
- int i;
- unsigned long flags;
-
-
- spin_lock_irqsave(&dev->sgid_lock, flags);
- /* first is default sgid, which cannot be deleted. */
- for (i = 1; i < OCRDMA_MAX_SGID; i++) {
- if (!memcmp(&dev->sgid_tbl[i], sgid, sizeof(union ib_gid))) {
- /* found matching entry */
- memset(&dev->sgid_tbl[i], 0, sizeof(union ib_gid));
- found = true;
- break;
- }
- }
- spin_unlock_irqrestore(&dev->sgid_lock, flags);
- return found;
-}
-
-static int ocrdma_addr_event(unsigned long event, struct net_device *netdev,
- union ib_gid *gid)
-{
- struct ib_event gid_event;
- struct ocrdma_dev *dev;
- bool found = false;
- bool updated = false;
- bool is_vlan = false;
-
- is_vlan = netdev->priv_flags & IFF_802_1Q_VLAN;
- if (is_vlan)
- netdev = rdma_vlan_dev_real_dev(netdev);
-
- rcu_read_lock();
- list_for_each_entry_rcu(dev, &ocrdma_dev_list, entry) {
- if (dev->nic_info.netdev == netdev) {
- found = true;
- break;
- }
- }
- rcu_read_unlock();
-
- if (!found)
- return NOTIFY_DONE;
-
- mutex_lock(&dev->dev_lock);
- switch (event) {
- case NETDEV_UP:
- updated = ocrdma_add_sgid(dev, gid);
- break;
- case NETDEV_DOWN:
- updated = ocrdma_del_sgid(dev, gid);
- break;
- default:
- break;
- }
- if (updated) {
- /* GID table updated, notify the consumers about it */
- gid_event.device = &dev->ibdev;
- gid_event.element.port_num = 1;
- gid_event.event = IB_EVENT_GID_CHANGE;
- ib_dispatch_event(&gid_event);
- }
- mutex_unlock(&dev->dev_lock);
- return NOTIFY_OK;
-}
-
-static int ocrdma_inetaddr_event(struct notifier_block *notifier,
- unsigned long event, void *ptr)
-{
- struct in_ifaddr *ifa = ptr;
- union ib_gid gid;
- struct net_device *netdev = ifa->ifa_dev->dev;
-
- ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
- return ocrdma_addr_event(event, netdev, &gid);
-}
-
-static struct notifier_block ocrdma_inetaddr_notifier = {
- .notifier_call = ocrdma_inetaddr_event
-};
-
-#if IS_ENABLED(CONFIG_IPV6)
-
-static int ocrdma_inet6addr_event(struct notifier_block *notifier,
- unsigned long event, void *ptr)
-{
- struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
- union ib_gid *gid = (union ib_gid *)&ifa->addr;
- struct net_device *netdev = ifa->idev->dev;
- return ocrdma_addr_event(event, netdev, gid);
-}
-
-static struct notifier_block ocrdma_inet6addr_notifier = {
- .notifier_call = ocrdma_inet6addr_event
-};
-
-#endif /* IPV6 and VLAN */
-
static enum rdma_link_layer ocrdma_link_layer(struct ib_device *device,
u8 port_num)
{
@@ -263,6 +132,8 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
dev->ibdev.query_port = ocrdma_query_port;
dev->ibdev.modify_port = ocrdma_modify_port;
dev->ibdev.query_gid = ocrdma_query_gid;
+ dev->ibdev.get_netdev = ocrdma_get_netdev;
+ dev->ibdev.modify_gid = ocrdma_modify_gid;
dev->ibdev.get_link_layer = ocrdma_link_layer;
dev->ibdev.alloc_pd = ocrdma_alloc_pd;
dev->ibdev.dealloc_pd = ocrdma_dealloc_pd;
@@ -325,12 +196,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
{
mutex_init(&dev->dev_lock);
- dev->sgid_tbl = kzalloc(sizeof(union ib_gid) *
- OCRDMA_MAX_SGID, GFP_KERNEL);
- if (!dev->sgid_tbl)
- goto alloc_err;
- spin_lock_init(&dev->sgid_lock);
-
dev->cq_tbl = kzalloc(sizeof(struct ocrdma_cq *) *
OCRDMA_MAX_CQ, GFP_KERNEL);
if (!dev->cq_tbl)
@@ -362,7 +227,6 @@ static void ocrdma_free_resources(struct ocrdma_dev *dev)
kfree(dev->stag_arr);
kfree(dev->qp_tbl);
kfree(dev->cq_tbl);
- kfree(dev->sgid_tbl);
}
/* OCRDMA sysfs interface */
@@ -408,68 +272,6 @@ static void ocrdma_remove_sysfiles(struct ocrdma_dev *dev)
device_remove_file(&dev->ibdev.dev, ocrdma_attributes[i]);
}
-static void ocrdma_add_default_sgid(struct ocrdma_dev *dev)
-{
- /* GID Index 0 - Invariant manufacturer-assigned EUI-64 */
- union ib_gid *sgid = &dev->sgid_tbl[0];
-
- sgid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
- ocrdma_get_guid(dev, &sgid->raw[8]);
-}
-
-static void ocrdma_init_ipv4_gids(struct ocrdma_dev *dev,
- struct net_device *net)
-{
- struct in_device *in_dev;
- union ib_gid gid;
- in_dev = in_dev_get(net);
- if (in_dev) {
- for_ifa(in_dev) {
- ipv6_addr_set_v4mapped(ifa->ifa_address,
- (struct in6_addr *)&gid);
- ocrdma_add_sgid(dev, &gid);
- }
- endfor_ifa(in_dev);
- in_dev_put(in_dev);
- }
-}
-
-static void ocrdma_init_ipv6_gids(struct ocrdma_dev *dev,
- struct net_device *net)
-{
-#if IS_ENABLED(CONFIG_IPV6)
- struct inet6_dev *in6_dev;
- union ib_gid *pgid;
- struct inet6_ifaddr *ifp;
- in6_dev = in6_dev_get(net);
- if (in6_dev) {
- read_lock_bh(&in6_dev->lock);
- list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
- pgid = (union ib_gid *)&ifp->addr;
- ocrdma_add_sgid(dev, pgid);
- }
- read_unlock_bh(&in6_dev->lock);
- in6_dev_put(in6_dev);
- }
-#endif
-}
-
-static void ocrdma_init_gid_table(struct ocrdma_dev *dev)
-{
- struct net_device *net_dev;
-
- for_each_netdev(&init_net, net_dev) {
- struct net_device *real_dev = rdma_vlan_dev_real_dev(net_dev) ?
- rdma_vlan_dev_real_dev(net_dev) : net_dev;
-
- if (real_dev == dev->nic_info.netdev) {
- ocrdma_add_default_sgid(dev);
- ocrdma_init_ipv4_gids(dev, net_dev);
- ocrdma_init_ipv6_gids(dev, net_dev);
- }
- }
-}
-
static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
{
int status = 0, i;
@@ -498,7 +300,6 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
goto alloc_err;
ocrdma_init_service_level(dev);
- ocrdma_init_gid_table(dev);
status = ocrdma_register_device(dev);
if (status)
goto alloc_err;
@@ -645,34 +446,12 @@ static struct ocrdma_driver ocrdma_drv = {
.be_abi_version = OCRDMA_BE_ROCE_ABI_VERSION,
};
-static void ocrdma_unregister_inet6addr_notifier(void)
-{
-#if IS_ENABLED(CONFIG_IPV6)
- unregister_inet6addr_notifier(&ocrdma_inet6addr_notifier);
-#endif
-}
-
-static void ocrdma_unregister_inetaddr_notifier(void)
-{
- unregister_inetaddr_notifier(&ocrdma_inetaddr_notifier);
-}
-
static int __init ocrdma_init_module(void)
{
int status;
ocrdma_init_debugfs();
- status = register_inetaddr_notifier(&ocrdma_inetaddr_notifier);
- if (status)
- return status;
-
-#if IS_ENABLED(CONFIG_IPV6)
- status = register_inet6addr_notifier(&ocrdma_inet6addr_notifier);
- if (status)
- goto err_notifier6;
-#endif
-
status = be_roce_register_driver(&ocrdma_drv);
if (status)
goto err_be_reg;
@@ -680,19 +459,13 @@ static int __init ocrdma_init_module(void)
return 0;
err_be_reg:
-#if IS_ENABLED(CONFIG_IPV6)
- ocrdma_unregister_inet6addr_notifier();
-err_notifier6:
-#endif
- ocrdma_unregister_inetaddr_notifier();
+
return status;
}
static void __exit ocrdma_exit_module(void)
{
be_roce_unregister_driver(&ocrdma_drv);
- ocrdma_unregister_inet6addr_notifier();
- ocrdma_unregister_inetaddr_notifier();
ocrdma_rem_debugfs();
}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_sli.h b/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
index 243c87c..6b74eb9 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_sli.h
@@ -125,6 +125,14 @@ enum {
OCRDMA_DB_RQ_SHIFT = 24
};
+enum {
+ OCRDMA_L3_TYPE_IB_GRH = 0x00,
+ OCRDMA_L3_TYPE_IPV4 = 0x01,
+ OCRDMA_L3_TYPE_IPV6 = 0x02
+};
+
+#define OCRDMA_ROUDP_FLAGS_SHIFT 0x03
+
#define OCRDMA_DB_CQ_RING_ID_MASK 0x3FF /* bits 0 - 9 */
#define OCRDMA_DB_CQ_RING_ID_EXT_MASK 0x0C00 /* bits 10-11 of qid at 12-11 */
/* qid #2 msbits at 12-11 */
@@ -488,6 +496,9 @@ enum {
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT = 8,
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK = 0xFF <<
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT,
+ OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT = 0,
+ OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK = 0xFF <<
+ OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT,
OCRDMA_MBX_QUERY_CFG_MAX_SEND_SGE_SHIFT = 0,
OCRDMA_MBX_QUERY_CFG_MAX_SEND_SGE_MASK = 0xFFFF,
@@ -1049,6 +1060,8 @@ enum {
OCRDMA_QP_PARAMS_STATE_MASK = BIT(5) | BIT(6) | BIT(7),
OCRDMA_QP_PARAMS_FLAGS_SQD_ASYNC = BIT(8),
OCRDMA_QP_PARAMS_FLAGS_INB_ATEN = BIT(9),
+ OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT = 11,
+ OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK = BIT(11) | BIT(12) | BIT(13),
OCRDMA_QP_PARAMS_MAX_SGE_RECV_SHIFT = 16,
OCRDMA_QP_PARAMS_MAX_SGE_RECV_MASK = 0xFFFF <<
OCRDMA_QP_PARAMS_MAX_SGE_RECV_SHIFT,
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index cf1f515..f1c4290 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -31,6 +31,7 @@
#include <rdma/iw_cm.h>
#include <rdma/ib_umem.h>
#include <rdma/ib_addr.h>
+#include <rdma/ib_cache.h>
#include "ocrdma.h"
#include "ocrdma_hw.h"
@@ -49,6 +50,7 @@ int ocrdma_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 *pkey)
int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
int index, union ib_gid *sgid)
{
+ int ret;
struct ocrdma_dev *dev;
dev = get_ocrdma_dev(ibdev);
@@ -56,7 +58,22 @@ int ocrdma_query_gid(struct ib_device *ibdev, u8 port,
if (index >= OCRDMA_MAX_SGID)
return -EINVAL;
- memcpy(sgid, &dev->sgid_tbl[index], sizeof(*sgid));
+ ret = ib_get_cached_gid(ibdev, port, index, sgid);
+ if (ret == -EAGAIN) {
+ memcpy(sgid, &zgid, sizeof(*sgid));
+ return 0;
+ }
+
+ return ret;
+}
+
+int ocrdma_modify_gid(struct ib_device *ibdev, u8 port_num, unsigned int index,
+ const union ib_gid *gid, const struct ib_gid_attr *attr,
+ void **context)
+{
+ struct ocrdma_dev *dev;
+
+ dev = get_ocrdma_dev(ibdev);
return 0;
}
@@ -106,6 +123,15 @@ int ocrdma_query_device(struct ib_device *ibdev, struct ib_device_attr *attr)
return 0;
}
+struct net_device *ocrdma_get_netdev(struct ib_device *ibdev, u8 port_num)
+{
+ struct ocrdma_dev *dev = get_ocrdma_dev(ibdev);
+
+ if (dev)
+ return dev->nic_info.netdev;
+
+ return NULL;
+}
static inline void get_link_speed_and_width(struct ocrdma_dev *dev,
u8 *ib_speed, u8 *ib_width)
{
@@ -175,7 +201,8 @@ int ocrdma_query_port(struct ib_device *ibdev,
props->port_cap_flags =
IB_PORT_CM_SUP |
IB_PORT_REINIT_SUP |
- IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP | IB_PORT_IP_BASED_GIDS;
+ IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP |
+ IB_PORT_IP_BASED_GIDS | IB_PORT_ROCE;
props->gid_tbl_len = OCRDMA_MAX_SGID;
props->pkey_tbl_len = 1;
props->bad_pkey_cntr = 0;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 3cdc81e..b24795c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -47,6 +47,10 @@ ocrdma_query_protocol(struct ib_device *device, u8 port_num);
void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
int ocrdma_query_gid(struct ib_device *, u8 port,
int index, union ib_gid *gid);
+struct net_device *ocrdma_get_netdev(struct ib_device *device, u8 port_num);
+int ocrdma_modify_gid(struct ib_device *ibdev, u8 port_num, unsigned int index,
+ const union ib_gid *gid, const struct ib_gid_attr *attr,
+ void **context);
int ocrdma_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
struct ib_ucontext *ocrdma_alloc_ucontext(struct ib_device *,
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 45+ messages in thread* RE: [PATCH for-next V5 00/12] Move RoCE GID management to IB/Core
[not found] ` <1433772735-22416-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
` (11 preceding siblings ...)
2015-06-08 14:12 ` [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core Matan Barak
@ 2015-06-08 21:37 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373A8FE5D17-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
12 siblings, 1 reply; 45+ messages in thread
From: Hefty, Sean @ 2015-06-08 21:37 UTC (permalink / raw)
To: Matan Barak, Doug Ledford
Cc: Or Gerlitz, Moni Shoua, Jason Gunthorpe, Somnath Kotur,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Previously, every vendor implemented its net device notifiers in its own
> driver. This introduces a huge code duplication as figuring
> 28 files changed, 2253 insertions(+), 860 deletions(-)
How does adding 1400 lines of code help reduce code duplication?
Can you please explain and justify why this change is actually needed?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread