[PATCH for-next V7 00/10] Move RoCE GID management to IB/Core

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
@ 2015-07-30 15:33 Matan Barak
  2015-07-30 15:33 ` [PATCH for-next V7 01/10] net/ipv6: Export addrconf_ifid_eui48 Matan Barak
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Matan Barak @ 2015-07-30 15:33 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, Jason Gunthorpe, linux-rdma, Sean Hefty,
	Somnath Kotur, Moni Shoua, talal, haggaie, netdev

This series has been running in linux-rdma for a while. We added here
CC to netdev for the three pre-patches which come first. They allow
the IB core to access some helpers (e.g generating default Eth IPv6
link local address), gain more info on bonding changes, etc.

Previously, every vendor implemented its net device notifiers in its own
driver. This introduces a huge code duplication as figuring
whether an event is related to the vendor's net device in the
various cases (bonding, vlan or any other upper device) is
similar for all vendors. In the future, when multiple GID types will
be supported, this code duplication would have gotten even worse.

Therefore, we decided moving this into a common core core.
roce_gid_table and roce_gid_mgmt were created in order to store and
manage the new GID table, by filling it when getting the related events.
Vendors now only have to implement modify_gid and get_netdev IB
device calls, which are truly unique for each vendor.
roce_gid_table is implemented as IB client that manages the GID
table of the IB device. Each GID is associated with a GID type and a
network device (which is mandatory for management of the GID table).
The GID table is populated by using roce_gid_mgmt. roce_gid_mgmt
registers to net device/inet/inet events and calls roce_gid_table
in order to populate the GID table accordingly.

Patch 0005 is the core patch in this series. It creates a new infrastructure
for storing GIDs and their attributes in IB/core. This infrastructure support
reading and writing GIDs alongside with their meta-data. The new infrastructure
is used for both manageing RoCE ports and IB ports. The core difference is that
in IB ports, this infrastructure is used souly as a cache, while in RoCE we
actually manage the vendor's GID table by calling add_gid and del_gid callbacks.
In RoCE, we always enable default gids for an active device (an active device
is defined here as a device that doesn't have a bonding master or is the current
active slave). This is done in order to allow loopback traffic.

Patch 0004 replaces the locking schema for IB devices. Previously, device_mutex
was used in order to lock the devices/clients list against every modification.
However, downstream patches add new functions which iterate over the device
list. Those functions could be executed for a workqueue contexts on behalf
of IB clients. Thus, when a client is removed, we need to wait for all works
to be finished. Since a client removal was done in device_mutex lock, we'll
be in fact waiting for a work which requires to lock the device_mutex itself
(=DEADLOCK). In order to mitigate this problem, we use rw semaphore to allow
multiple readers. We use a mutex in order to solve races between adding
(or removing) a client and a device simultaneously, which could have resulted
in calling client->add (or client->remove) twice for the same device and client.
This patch was sent as part of "Add network namespace support in the RDMA-CM"
series.

Patch 0006 adds population of this table for the bonding case based on net
device events. Only the active slaves retain their master's IP based gids and
default gids.

Patch 0001 exports addrconf_ifid_eui48 in order to generate the default GID.
Patch 0002 adds information for NETDEV_CHANGEUPPER which is used in order to
understand the nature of change - link/unlink and which master net-device is
related to this change.
Patch 0003 exports bond_option_active_slave_get_rcu which is necassary in
order to assign the GIDs only to the active slave.

The rest of the patches add support for ocrdma and mlx4 devices.

This series is rebased over Doug's to-be-rebase/for-4.3.

Thanks,
Devesh, Somnath, Moni and Matan

Changes from V6:
(1) Addressed Jason's comments:
	(a) Cache is no longer a client but part of IB infrastructure
	(b) No need for READ_ONCE and flush_workqueue when tearing down
	    the cache

Changes from V5:
(1) Incoporate the changes to cache.c so we use the same infrastructure
    to manage both IB and RoCE (per Doug's request)
(2) Replace the locking mechanism in the IB core GID cache from seqcount +
    rcu to rwlock (addressing comments from Jason)
(3) get_netdev returns a helded (dev_hold) device
(4) Squashed the RocE GID table, RoCE GID management and default GID handling
    code into one patch (per Doug's request).
(5) Change modify_gid to add_gid and del_gid.
(6) set the netdev related changes into three dedicated patches and make
    them be 1st in the series.

Changes from V4:
(1) Remove any API changes.
(2) Fixed a bug regarding bonding upper devices.
(3) Rebased ontop of Doug's k.o/for-4.2.

Changes from V3:
(1) Remove RoCE V2 functionality (it will be sent at later patchset).
(2) Instead of removing qp_attr_mask flags, reserve them.
(3) Remove the kref from IB devices in favor of rwsem.
(4) Change the name of roce_gid_cache to roce_gid_table.
(5) Fix a race when roce_gid_table is free'd while getting events.
(6) Remove the roce_gid_cache active/inactive flag/API.

Changes from V2:
(1) When creating multiple vlans over an interface,
    only the last created vlan's GID was populated in the table
    (regression from V2).
(2) Inactive slave of bonding sometimes lost GIDs related to IPs
    that were directly applied to it.
(3) Memory leak in mlx4
(4) roce_gid_cache now calls modify_gid with zgid in order to cause
    the provider to delete all the information it allocated for those
    GIDs.
(4) A mlx4 patch didn't compile and a downstream patch fixed it.
(5) cma_configfs should depend on both address translation and configfs.
(6) ocrdma driver redefined zgid.
(7) Added event information for NETDEV_CHANGEUPPER event.

Changes from V1:
(1) Addressed Shachar and Haggai's comments
(2) Fixed multicast support
(3) Generalized bonding support
(4) Added default GID after the IB device's net device was removed from bonding
(5) Fixed bugs in mlx4 implementation regarding multicast
(6) Fixed bugs in mlx4 when using XRC QPs after this patchset was applied
(7) Fixed bug when the RoCE gid cache didn't exist
(8) Moved the bonding's DRV macros to a private header
(9) Support non-configfs configurations

Haggai Eran (1):
  IB/core: Add rwsem to allow reading device list or client list

Matan Barak (5):
  net/ipv6: Export addrconf_ifid_eui48
  net: Add info for NETDEV_CHANGEUPPER event
  net/bonding: Export bond_option_active_slave_get_rcu
  IB/core: Add RoCE GID table management
  IB/core: Add RoCE table bonding support

Moni Shoua (3):
  net/mlx4: Postpone the registration of net_device
  IB/mlx4: Implement ib_device callbacks
  IB/mlx4: Replace mechanism for RoCE GID management

Somnath Kotur (1):
  RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core

 drivers/infiniband/core/Makefile             |   3 +-
 drivers/infiniband/core/cache.c              | 704 ++++++++++++++++++++++---
 drivers/infiniband/core/core_priv.h          |  50 +-
 drivers/infiniband/core/device.c             | 134 ++++-
 drivers/infiniband/core/roce_gid_mgmt.c      | 730 ++++++++++++++++++++++++++
 drivers/infiniband/core/sysfs.c              |   1 +
 drivers/infiniband/hw/mlx4/ah.c              |   2 +-
 drivers/infiniband/hw/mlx4/main.c            | 749 ++++++++++-----------------
 drivers/infiniband/hw/mlx4/mlx4_ib.h         |  21 +-
 drivers/infiniband/hw/mlx4/qp.c              |  10 +-
 drivers/infiniband/hw/ocrdma/ocrdma.h        |   1 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   | 234 +--------
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h    |   2 +
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  45 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |  11 +
 drivers/net/bonding/bond_options.c           |  13 -
 drivers/net/ethernet/mellanox/mlx4/en_main.c |  36 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c    |   3 +
 include/linux/mlx4/device.h                  |   3 +-
 include/linux/mlx4/driver.h                  |   1 +
 include/linux/netdevice.h                    |  14 +
 include/net/addrconf.h                       |  31 ++
 include/net/bonding.h                        |   7 +
 include/rdma/ib_verbs.h                      |  68 ++-
 net/core/dev.c                               |  12 +-
 net/ipv6/addrconf.c                          |  31 --
 26 files changed, 2017 insertions(+), 899 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c

-- 
2.1.0

Cc: netdev@vger.kernel.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH for-next V7 01/10] net/ipv6: Export addrconf_ifid_eui48
  2015-07-30 15:33 [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Matan Barak
@ 2015-07-30 15:33 ` Matan Barak
  2015-07-30 15:33 ` [PATCH for-next V7 02/10] net: Add info for NETDEV_CHANGEUPPER event Matan Barak
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Matan Barak @ 2015-07-30 15:33 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, Jason Gunthorpe, linux-rdma, Sean Hefty,
	Somnath Kotur, Moni Shoua, talal, haggaie, netdev

For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.

addrconf_ifid_eui48 is used to gernerate this address, export it.

Signed-off-by: Matan Barak <matanb@mellanox.com>
---
 include/net/addrconf.h | 31 +++++++++++++++++++++++++++++++
 net/ipv6/addrconf.c    | 31 -------------------------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index def59d3..431fdfa 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -91,6 +91,37 @@ int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2);
 void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
 void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr);
 
+static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
+{
+	if (dev->addr_len != ETH_ALEN)
+		return -1;
+	memcpy(eui, dev->dev_addr, 3);
+	memcpy(eui + 5, dev->dev_addr + 3, 3);
+
+	/*
+	 * The zSeries OSA network cards can be shared among various
+	 * OS instances, but the OSA cards have only one MAC address.
+	 * This leads to duplicate address conflicts in conjunction
+	 * with IPv6 if more than one instance uses the same card.
+	 *
+	 * The driver for these cards can deliver a unique 16-bit
+	 * identifier for each instance sharing the same card.  It is
+	 * placed instead of 0xFFFE in the interface identifier.  The
+	 * "u" bit of the interface identifier is not inverted in this
+	 * case.  Hence the resulting interface identifier has local
+	 * scope according to RFC2373.
+	 */
+	if (dev->dev_id) {
+		eui[3] = (dev->dev_id >> 8) & 0xFF;
+		eui[4] = dev->dev_id & 0xFF;
+	} else {
+		eui[3] = 0xFF;
+		eui[4] = 0xFE;
+		eui[0] ^= 2;
+	}
+	return 0;
+}
+
 static inline unsigned long addrconf_timeout_fixup(u32 timeout,
 						   unsigned int unit)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..5b0c041 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1845,37 +1845,6 @@ static void addrconf_leave_anycast(struct inet6_ifaddr *ifp)
 	__ipv6_dev_ac_dec(ifp->idev, &addr);
 }
 
-static int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
-{
-	if (dev->addr_len != ETH_ALEN)
-		return -1;
-	memcpy(eui, dev->dev_addr, 3);
-	memcpy(eui + 5, dev->dev_addr + 3, 3);
-
-	/*
-	 * The zSeries OSA network cards can be shared among various
-	 * OS instances, but the OSA cards have only one MAC address.
-	 * This leads to duplicate address conflicts in conjunction
-	 * with IPv6 if more than one instance uses the same card.
-	 *
-	 * The driver for these cards can deliver a unique 16-bit
-	 * identifier for each instance sharing the same card.  It is
-	 * placed instead of 0xFFFE in the interface identifier.  The
-	 * "u" bit of the interface identifier is not inverted in this
-	 * case.  Hence the resulting interface identifier has local
-	 * scope according to RFC2373.
-	 */
-	if (dev->dev_id) {
-		eui[3] = (dev->dev_id >> 8) & 0xFF;
-		eui[4] = dev->dev_id & 0xFF;
-	} else {
-		eui[3] = 0xFF;
-		eui[4] = 0xFE;
-		eui[0] ^= 2;
-	}
-	return 0;
-}
-
 static int addrconf_ifid_eui64(u8 *eui, struct net_device *dev)
 {
 	if (dev->addr_len != IEEE802154_ADDR_LEN)
-- 
2.1.0

Cc: netdev@vger.kernel.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next V7 02/10] net: Add info for NETDEV_CHANGEUPPER event
  2015-07-30 15:33 [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Matan Barak
  2015-07-30 15:33 ` [PATCH for-next V7 01/10] net/ipv6: Export addrconf_ifid_eui48 Matan Barak
@ 2015-07-30 15:33 ` Matan Barak
       [not found] ` <1438270411-17648-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-31  9:40 ` [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Or Gerlitz
  3 siblings, 0 replies; 16+ messages in thread
From: Matan Barak @ 2015-07-30 15:33 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, Jason Gunthorpe, linux-rdma, Sean Hefty,
	Somnath Kotur, Moni Shoua, talal, haggaie, netdev

Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.

Add information in the notifier info block for that purpose.

Signed-off-by: Matan Barak <matanb@mellanox.com>
---
 include/linux/netdevice.h | 14 ++++++++++++++
 net/core/dev.c            | 12 ++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e20979d..2b7fe4e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3556,6 +3556,20 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 				    netdev_features_t features);
 
+enum netdev_changeupper_event {
+	NETDEV_CHANGEUPPER_LINK,
+	NETDEV_CHANGEUPPER_UNLINK,
+};
+
+struct netdev_changeupper_info {
+	struct netdev_notifier_info	info; /* must be first */
+	enum netdev_changeupper_event	event;
+	struct net_device		*upper;
+};
+
+void netdev_changeupper_info_change(struct net_device *dev,
+				    struct netdev_changeupper_info *info);
+
 struct netdev_bonding_info {
 	ifslave	slave;
 	ifbond	master;
diff --git a/net/core/dev.c b/net/core/dev.c
index a8e4dd4..6e6f14e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5302,6 +5302,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
 				   void *private)
 {
 	struct netdev_adjacent *i, *j, *to_i, *to_j;
+	struct netdev_changeupper_info changeupper_info;
 	int ret = 0;
 
 	ASSERT_RTNL();
@@ -5357,7 +5358,10 @@ static int __netdev_upper_dev_link(struct net_device *dev,
 			goto rollback_lower_mesh;
 	}
 
-	call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+	changeupper_info.event = NETDEV_CHANGEUPPER_LINK;
+	changeupper_info.upper = upper_dev;
+	call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+				      &changeupper_info.info);
 	return 0;
 
 rollback_lower_mesh:
@@ -5453,6 +5457,7 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 			     struct net_device *upper_dev)
 {
 	struct netdev_adjacent *i, *j;
+	struct netdev_changeupper_info changeupper_info;
 	ASSERT_RTNL();
 
 	__netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
@@ -5474,7 +5479,10 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 	list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
 		__netdev_adjacent_dev_unlink(dev, i->dev);
 
-	call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+	changeupper_info.event = NETDEV_CHANGEUPPER_UNLINK;
+	changeupper_info.upper = upper_dev;
+	call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+				      &changeupper_info.info);
 }
 EXPORT_SYMBOL(netdev_upper_dev_unlink);
 
-- 
2.1.0

Cc: netdev@vger.kernel.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next V7 03/10] net/bonding: Export bond_option_active_slave_get_rcu
       [not found] ` <1438270411-17648-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-30 15:33   ` Matan Barak
  0 siblings, 0 replies; 16+ messages in thread
From: Matan Barak @ 2015-07-30 15:33 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, Or Gerlitz, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sean Hefty, Somnath Kotur,
	Moni Shoua, talal-VPRAkNaXOzVWk0Htik3J/w,
	haggaie-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA

Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/bonding/bond_options.c | 13 -------------
 include/net/bonding.h              |  7 +++++++
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index e9c624d..28bd005 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -730,19 +730,6 @@ static int bond_option_mode_set(struct bonding *bond,
 	return 0;
 }
 
-static struct net_device *__bond_option_active_slave_get(struct bonding *bond,
-							 struct slave *slave)
-{
-	return bond_uses_primary(bond) && slave ? slave->dev : NULL;
-}
-
-struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond)
-{
-	struct slave *slave = rcu_dereference(bond->curr_active_slave);
-
-	return __bond_option_active_slave_get(bond, slave);
-}
-
 static int bond_option_active_slave_set(struct bonding *bond,
 					const struct bond_opt_value *newval)
 {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 20defc0..c1740a2 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -310,6 +310,13 @@ static inline bool bond_uses_primary(struct bonding *bond)
 	return bond_mode_uses_primary(BOND_MODE(bond));
 }
 
+static inline struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond)
+{
+	struct slave *slave = rcu_dereference(bond->curr_active_slave);
+
+	return bond_uses_primary(bond) && slave ? slave->dev : NULL;
+}
+
 static inline bool bond_slave_is_up(struct slave *slave)
 {
 	return netif_running(slave->dev) && netif_carrier_ok(slave->dev);
-- 
2.1.0

Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
  2015-07-30 15:33 [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Matan Barak
                   ` (2 preceding siblings ...)
       [not found] ` <1438270411-17648-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-31  9:40 ` Or Gerlitz
       [not found]   ` <CAJ3xEMh9z=LG7dW0nBP8Zdrbf0SkZ_GCxjCoowWXuw2dueN+xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  3 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2015-07-31  9:40 UTC (permalink / raw)
  To: Matan Barak, Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, Sean Hefty, Somnath Kotur, Moni Shoua,
	talal@mellanox.com, Haggai Eran, Linux Netdev List

On Thu, Jul 30, 2015 at 6:33 PM, Matan Barak <matanb@mellanox.com> wrote:

[...]

> Changes from V6:
> (1) Addressed Jason's comments:
>         (a) Cache is no longer a client but part of IB infrastructure
>         (b) No need for READ_ONCE and flush_workqueue when tearing down
>             the cache

Doug

So... are we ready to go with V7 upstream?

Or.


> Changes from V5:
> (1) Incoporate the changes to cache.c so we use the same infrastructure
>     to manage both IB and RoCE (per Doug's request)
> (2) Replace the locking mechanism in the IB core GID cache from seqcount +
>     rcu to rwlock (addressing comments from Jason)
> (3) get_netdev returns a helded (dev_hold) device
> (4) Squashed the RocE GID table, RoCE GID management and default GID handling
>     code into one patch (per Doug's request).
> (5) Change modify_gid to add_gid and del_gid.
> (6) set the netdev related changes into three dedicated patches and make
>     them be 1st in the series.
>
> Changes from V4:
> (1) Remove any API changes.
> (2) Fixed a bug regarding bonding upper devices.
> (3) Rebased ontop of Doug's k.o/for-4.2.
>
> Changes from V3:
> (1) Remove RoCE V2 functionality (it will be sent at later patchset).
> (2) Instead of removing qp_attr_mask flags, reserve them.
> (3) Remove the kref from IB devices in favor of rwsem.
> (4) Change the name of roce_gid_cache to roce_gid_table.
> (5) Fix a race when roce_gid_table is free'd while getting events.
> (6) Remove the roce_gid_cache active/inactive flag/API.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]   ` <CAJ3xEMh9z=LG7dW0nBP8Zdrbf0SkZ_GCxjCoowWXuw2dueN+xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-07-31 12:50     ` Doug Ledford
       [not found]       ` <55BB6F10.5030408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Doug Ledford @ 2015-07-31 12:50 UTC (permalink / raw)
  To: Or Gerlitz, Matan Barak, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

On 07/31/2015 05:40 AM, Or Gerlitz wrote:
> On Thu, Jul 30, 2015 at 6:33 PM, Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> 
> [...]
> 
>> Changes from V6:
>> (1) Addressed Jason's comments:
>>         (a) Cache is no longer a client but part of IB infrastructure
>>         (b) No need for READ_ONCE and flush_workqueue when tearing down
>>             the cache
> 
> Doug
> 
> So... are we ready to go with V7 upstream?

Yes.  I've already pulled it in.

> Or.
> 
> 
>> Changes from V5:
>> (1) Incoporate the changes to cache.c so we use the same infrastructure
>>     to manage both IB and RoCE (per Doug's request)
>> (2) Replace the locking mechanism in the IB core GID cache from seqcount +
>>     rcu to rwlock (addressing comments from Jason)
>> (3) get_netdev returns a helded (dev_hold) device
>> (4) Squashed the RocE GID table, RoCE GID management and default GID handling
>>     code into one patch (per Doug's request).
>> (5) Change modify_gid to add_gid and del_gid.
>> (6) set the netdev related changes into three dedicated patches and make
>>     them be 1st in the series.
>>
>> Changes from V4:
>> (1) Remove any API changes.
>> (2) Fixed a bug regarding bonding upper devices.
>> (3) Rebased ontop of Doug's k.o/for-4.2.
>>
>> Changes from V3:
>> (1) Remove RoCE V2 functionality (it will be sent at later patchset).
>> (2) Instead of removing qp_attr_mask flags, reserve them.
>> (3) Remove the kref from IB devices in favor of rwsem.
>> (4) Change the name of roce_gid_cache to roce_gid_table.
>> (5) Fix a race when roce_gid_table is free'd while getting events.
>> (6) Remove the roce_gid_cache active/inactive flag/API.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]       ` <55BB6F10.5030408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-07-31 16:32         ` Jason Gunthorpe
       [not found]           ` <20150731163201.GA6027-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2015-07-31 16:32 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Or Gerlitz, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

On Fri, Jul 31, 2015 at 08:50:24AM -0400, Doug Ledford wrote:

> > So... are we ready to go with V7 upstream?
> 
> Yes.  I've already pulled it in.

You are taking the netdev stuff without an ack from netdev??

I've been too busy too look at v7, but a quick check of the 'move the
cache into core code stuff' looks wrong to
me. ib_unregister_event_handler and flush_workqueue should not/can not be
called from a kref free'r.

Looking at your for-rebase branch.. please make sure you get all the
attributions this cycle :|

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]           ` <20150731163201.GA6027-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-31 17:41             ` Doug Ledford
       [not found]               ` <55BBB353.7020806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-07-31 21:24               ` Or Gerlitz
  0 siblings, 2 replies; 16+ messages in thread
From: Doug Ledford @ 2015-07-31 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 3279 bytes --]

On 07/31/2015 12:32 PM, Jason Gunthorpe wrote:
> On Fri, Jul 31, 2015 at 08:50:24AM -0400, Doug Ledford wrote:
> 
>>> So... are we ready to go with V7 upstream?
>>
>> Yes.  I've already pulled it in.
> 
> You are taking the netdev stuff without an ack from netdev??

I've pulled it in, yes.  Dave Miller/netdev needs to ack it still.  I
really doubt they will object to the 1st or 3rd patch, but they might
have comments on the second.  Since they weren't copied on the original
submission, I'll be pinging them separately.  Dave may prefer if they go
through his tree, and that's fine, but I still need them in my tree as
of now for testing purposes.

> I've been too busy too look at v7, but a quick check of the 'move the
> cache into core code stuff' looks wrong to
> me. ib_unregister_event_handler and flush_workqueue should not/can not be
> called from a kref free'r.

Please be more specific here.  What are you objecting to?  Are you
objecting to a flush_workqueue from a release() context?  Because I
don't see anything in the kref documentation or the kref implementation
that prevents or prohibits this.  Or are you referring to the fact that
they aren't unregistering their event handler until well after the
parent device is unregistered?  That's obviously wrong, but it's also
easy to fix (the obvious fix is that they should be calling
ib_cache_cleanup_one from the top of ib_unregister_device versus waiting
for a kref put).

Regardless though, the reason I'm taking this (and a number of other
things) into my tree is that waiting for things to be absolutely perfect
on a submission is an interminable waiting game.  We have about 3 weeks
or maybe a little more until the next merge window opens.  In that time,
doing v8 of this patchset, and v9 of that patchset, and v5 of another
patchset all gets to be a huge bog on everyone's time.  These various
patchsets have reached the point where they are close and incremental
patches would be better than resubmitting the entire patchset over and
over again.  Not to mention that some of these fixes are quick and easy
to do and I'd rather quit waiting 24+ hours for a respin turnaround when
I could just hop in and make the fix myself in 15 minutes and test it
immediately.

> Looking at your for-rebase branch.. please make sure you get all the
> attributions this cycle :|

Yet *another* reason why these v6, v7, v8 patchsets are a huge time
drain :-/.  For a 10 patch v7 patchset, I need to look through 70 patch
listings in patchworks and try to see which reviewers didn't get their
attribution carried forward, and on top of that I have to make a
judgment call about which reviewed-bys should or shouldn't get carried
forward based upon how much changed in the next patch and whether or not
a previous review is still relevant or has the patch changed enough that
it needs a new review?  Really, this is my only complaint about
patchworks.  Aside from attribution loss on resubmit, it works great.

Anyway, this thread brings up an important issue, scheduling for the 4.3
merge window.  I'll bring that up in a separate email later today.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]               ` <55BBB353.7020806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-07-31 20:18                 ` Jason Gunthorpe
       [not found]                   ` <20150731201828.GA28263-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2015-07-31 20:18 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Or Gerlitz, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

On Fri, Jul 31, 2015 at 01:41:39PM -0400, Doug Ledford wrote:
> Please be more specific here.  What are you objecting to?  Are you
> objecting to a flush_workqueue from a release() context?  Because I
> don't see anything in the kref documentation or the kref
> implementation that prevents or prohibits this.  Or are you
> referring to the fact that they aren't unregistering their event
> handler until well after the parent device is unregistered?

I'm talking about the basic invarient we have for our removal process:
ib_unregister_device must fence all calls into the driver, so the
driver can proceed with the low level remove on the asumption that
nothing will call into it after unregister returns.

Clients create this fence by calling things like flush_workqueue and
ib_unregister_event_handler.

If the fence is moved out of ib_unregister_device and into release
then the whole model breaks.

I've explained this all before on the v1 of these series..

> That's obviously wrong, but it's also easy to fix (the obvious fix
> is that they should be calling ib_cache_cleanup_one from the top of
> ib_unregister_device versus waiting for a kref put).

Well, no, that is not the obvious fix, that puts things back the way
they were in v6. As I explained the kfrees need to be in the release
function, the async fencing needs to remain in the remove_one or
equivalent.

> Regardless though, the reason I'm taking this (and a number of other
> things) into my tree is that waiting for things to be absolutely perfect
> on a submission is an interminable waiting game.

So, I'm confused by this..

I haven't been waiting, I've looked at every release of every one of
these three big core change patch set. And found actual problems.

What I don't understand is your timing on these.. No 'hey, could we
finish these reviews' or anything, just out of the blue, v7 merged! It
was just posted yesterday!

> Not to mention that some of these fixes are quick and easy
> to do and I'd rather quit waiting 24+ hours for a respin turnaround when
> I could just hop in and make the fix myself in 15 minutes and test it
> immediately.

It would have been a lot less work for me to just fix the various
issues than to explain them to the authors. I'm hoping that investment
means the next series around will be good at V1...

> > Looking at your for-rebase branch.. please make sure you get all the
> > attributions this cycle :|
> 
> Yet *another* reason why these v6, v7, v8 patchsets are a huge time
> drain :-/.

Sagi's recent patch set is missing several attributions too..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
  2015-07-31 17:41             ` Doug Ledford
       [not found]               ` <55BBB353.7020806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-07-31 21:24               ` Or Gerlitz
  2015-07-31 22:01                 ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2015-07-31 21:24 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma@vger.kernel.org,
	Sean Hefty, Somnath Kotur, Moni Shoua, talal@mellanox.com,
	Haggai Eran, Linux Netdev List

On Fri, Jul 31, 2015 at 8:41 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 07/31/2015 12:32 PM, Jason Gunthorpe wrote:
>> On Fri, Jul 31, 2015 at 08:50:24AM -0400, Doug Ledford wrote:

>>>> So... are we ready to go with V7 upstream?
>>> Yes.  I've already pulled it in.
>> You are taking the netdev stuff without an ack from netdev??

> I've pulled it in, yes.  Dave Miller/netdev needs to ack it still.  I
> really doubt they will object to the 1st or 3rd patch, but they might
> have comments on the second.  Since they weren't copied on the original
> submission, I'll be pinging them separately.  Dave may prefer if they go
> through his tree, and that's fine, but I still need them in my tree as
> of now for testing purposes.

Doug, netdev are copied since V6 which was posed on June 24, pretty
much enough time for anyone there to comment if needed. So the
remaining thing to take care of is ask Dave if he's OK that these
patches will go upstream through your tree or his.

>> I've been too busy too look at v7, but a quick check of the 'move the
>> cache into core code stuff' looks wrong to
>> me. ib_unregister_event_handler and flush_workqueue should not/can not be
>> called from a kref free'r.

> Please be more specific here.  What are you objecting to?  Are you
> objecting to a flush_workqueue from a release() context?  Because I
> don't see anything in the kref documentation or the kref implementation
> that prevents or prohibits this.  Or are you referring to the fact that
> they aren't unregistering their event handler until well after the
> parent device is unregistered?  That's obviously wrong, but it's also
> easy to fix (the obvious fix is that they should be calling
> ib_cache_cleanup_one from the top of ib_unregister_device versus waiting
> for a kref put).

Yes, the cover letter states the v6 --> v7 changes and if this is new
complaint, then indeed @ this point it would be fair to have it
addressed in incremental patch, as Doug suggested. Jason, it's wrong
to send developers again and again to fix things which were not
perfect in Vn-1 but also not being covered by reviewers on Vn-1, at
some point the reviewer can't load new comments which gate the
acceptance but rather be willing for things to be fixed incrementally.

Or.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
  2015-07-31 21:24               ` Or Gerlitz
@ 2015-07-31 22:01                 ` Jason Gunthorpe
  2015-08-01 21:48                   ` Or Gerlitz
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2015-07-31 22:01 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Matan Barak, linux-rdma@vger.kernel.org, Sean Hefty,
	Somnath Kotur, Moni Shoua, talal@mellanox.com, Haggai Eran,
	Linux Netdev List

On Sat, Aug 01, 2015 at 12:24:23AM +0300, Or Gerlitz wrote:

> addressed in incremental patch, as Doug suggested. Jason, it's wrong
> to send developers again and again to fix things which were 
> perfect in Vn-1 but also not being covered by reviewers on Vn-1, at
> some point the reviewer can't load new comments which gate the

I don't even know what you are talking about Or.

v6 had some small problems in the logic and v7 introduces a fairly
serious flaw while trying to fix them. IMHO, you are better to merge
v6 than v7, at least v6's problems are less likely to be serious.

IMHO, it took until around v5/v6 before this series was even worth
taking a detailed look at. I'm certainly not willing to waste my time
doing detailed reviews on other elements when basic things like
locking and refcounting are screwed up.

I think you are really off side with the idea that a review has to see
every problem in Vn or ignore it in Vn+1. That is obviously
unworkable.

> acceptance but rather be willing for things to be fixed incrementally.

That is the same argument you used for the timestamp _ex UAPI mess,
last cycle, where are the incremental fixes for that?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]                   ` <20150731201828.GA28263-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-08-01  0:30                     ` Doug Ledford
       [not found]                       ` <be961828-39ff-4a15-863e-93cc86dfcf49@email.android.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Doug Ledford @ 2015-08-01  0:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 8215 bytes --]

On 07/31/2015 04:18 PM, Jason Gunthorpe wrote:
> On Fri, Jul 31, 2015 at 01:41:39PM -0400, Doug Ledford wrote:
>> Please be more specific here.  What are you objecting to?  Are you
>> objecting to a flush_workqueue from a release() context?  Because I
>> don't see anything in the kref documentation or the kref
>> implementation that prevents or prohibits this.  Or are you
>> referring to the fact that they aren't unregistering their event
>> handler until well after the parent device is unregistered?
> 
> I'm talking about the basic invarient we have for our removal process:
> ib_unregister_device must fence all calls into the driver, so the
> driver can proceed with the low level remove on the asumption that
> nothing will call into it after unregister returns.

Agreed.

> Clients create this fence by calling things like flush_workqueue and
> ib_unregister_event_handler.

Agreed.

> If the fence is moved out of ib_unregister_device and into release
> then the whole model breaks.

Agreed.

> I've explained this all before on the v1 of these series..
> 
>> That's obviously wrong, but it's also easy to fix (the obvious fix
>> is that they should be calling ib_cache_cleanup_one from the top of
>> ib_unregister_device versus waiting for a kref put).
> 
> Well, no, that is not the obvious fix, that puts things back the way
> they were in v6. As I explained the kfrees need to be in the release
> function,

Disagree.

> the async fencing needs to remain in the remove_one or
> equivalent.

Agree.

So, my point of disagreement above.  Except in circumstances with a damn
good reason, I expect code to be symmetrical.  If you allocate some
caches in ib_cache_init_one(), I expect you to kfree those caches in
ib_cache_cleanup_one().  Further, if you call ib_cache_init_one() as one
of the last items before returning from ib_register_device(), then I
expect you to call ib_cache_cleanup_one() as one of the first items in
ib_unregister_device().

However, the fencing doesn't belong to ib_cache_cleanup_one(), it
belongs at the head of ib_unregister_device() and ib_cache_cleanup_one()
should only be called after the fence is complete.

And in this respect, removing the device from the various clients is
actually part of the fencing.  The order of ops in ib_register_device is:

ib_device_register_sysfs
ib_cache_init_one
for client_list {
    add_client_context
    client->add
}

Consequently, the order of ops in ib_unregister_device should be:

for client_list
    mark coming down
for client_list
    remove(client)
for client_list
    kfree(client)
ib_cache_cleanup_one <- missing
ib_device_unregister_sysfs

In many respects, I expect the ib_unregister_device() call to mirror the
error unwind found in the register call with the modifications for
dealing with a device that was actually live.

It's certainly possible in certain circumstances to veer from this
paradigm, but I expect a good reason for doing so.  A person who isn't
immediately RDMA aware can come in and fix a locking bug when the code
is like it is above.  If you start playing subtle tricks with the
locking and ordering of removal, you start to trip up anyone that
doesn't have first hand knowledge of those tricks and why they were
implemented.  That negatively impacts the long term support of the code,
so the reason should be sufficient to warrant the drawback and the
subtle trick should be documented in the code so someone coming around
later knows what's going on.

To a certain extent, your comments in the v6 thread about the cache
being expected to work beyond the life of the device and therefore part
of the core stack are somewhat reasonable, but the life of the cache on
a per device basis is no different than the life of the client
registration.  The above flow would remove the cache from the client
list, but will still in effect treat it just like a client.  The only
difference (and one that I think is correct), is that by taking it out
of the client_list and those loops, you are able to make sure it's the
first client that comes up and the last one that goes down and that it
continues to work for as long as it needs to (after all, once all of the
clients are unregistered and removed for that device, and we are in the
ib_unregister_device routine, the device is going away and no more calls
into the cache will arrive for it).

>> Regardless though, the reason I'm taking this (and a number of other
>> things) into my tree is that waiting for things to be absolutely perfect
>> on a submission is an interminable waiting game.
> 
> So, I'm confused by this..
> 
> I haven't been waiting, I've looked at every release of every one of
> these three big core change patch set. And found actual problems.

Some of lesser and greater importance.  Some worth resubmitting the
entire patchset, and some that I would be perfectly OK with an
incremental patch later on.  A problem does not always mean resubmit, it
can mean fix before merge window with another patch (or during merge
window depending on the situation).

> What I don't understand is your timing on these.. No 'hey, could we
> finish these reviews' or anything, just out of the blue, v7 merged! It
> was just posted yesterday!

That's because v6 was mostly done and the v6 -> v7 changes were minor.
I would have taken v6 and then accepted incremental patches had it still
applied.

These branches are from my internal tree and really are WIP branches.
I've pushed those WIP branches to github.  As I stated when I set my two
different repos up, my github repo is my WIP repo.  It's for people to
be able to see where we are, pull it if they wish, and preferably to
test against ahead of time.  I *want* people to be able to see where I'm
going.  But that doesn't mean it is fixed in stone.  For things like the
fix to the v7 version of this patch, I will accept an incremental patch,
but it need not have a complete changelog because if it just fixes one
of the patches in this submission, I'm OK with squashing it down into
the original patch in order to keep the patch count down and
bisectability up.  But that can only be done prior to the merge window.
 After the merge window opens and the code heads to Linus, fixes must be
discreet patches.

So you can actually break the merge/testing window down into two
distinct phases: pre-pull (where we can fix broken patches by squashing
a fix into and thereby preserve bisections) and post-pull (which is what
we are all used to).

This merge window is riskier than most because of the patches queued up
to go in.  By pulling all of this together and publishing my git repo
early, I'm inviting other interested parties to join me in the early
testing process where we can fix problems and also preserve bisection.

So that's why I'm pulling things together and posting my git repo ahead
of the merge window.  To speak more directly to the timing issue though,
there is only about 3 weeks left before the merge window opens, and all
of next week I'm going to be working on other things.  So I wanted to
get something out that people can look at and test with before I have to
redirect for a week, and that will still leave me two weeks to integrate
fixes prior to the merge window opening.

>> Not to mention that some of these fixes are quick and easy
>> to do and I'd rather quit waiting 24+ hours for a respin turnaround when
>> I could just hop in and make the fix myself in 15 minutes and test it
>> immediately.
> 
> It would have been a lot less work for me to just fix the various
> issues than to explain them to the authors. I'm hoping that investment
> means the next series around will be good at V1...
> 
>>> Looking at your for-rebase branch.. please make sure you get all the
>>> attributions this cycle :|
>>
>> Yet *another* reason why these v6, v7, v8 patchsets are a huge time
>> drain :-/.
> 
> Sagi's recent patch set is missing several attributions too..
> 
> Jason
> 

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
  2015-07-31 22:01                 ` Jason Gunthorpe
@ 2015-08-01 21:48                   ` Or Gerlitz
       [not found]                     ` <CAJ3xEMjgvgvc8WyUhHagiH4gNs6KK7pL_TmhXCywNiWEy4p70w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2015-08-01 21:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Matan Barak
  Cc: Doug Ledford, linux-rdma@vger.kernel.org, Sean Hefty,
	Somnath Kotur, Moni Shoua, talal@mellanox.com, Haggai Eran,
	Linux Netdev List

On Sat, Aug 1, 2015 at 1:01 AM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Sat, Aug 01, 2015 at 12:24:23AM +0300, Or Gerlitz wrote:
>
>> addressed in incremental patch, as Doug suggested. Jason, it's wrong
>> to send developers again and again to fix things which were
>> perfect in Vn-1 but also not being covered by reviewers on Vn-1, at
>> some point the reviewer can't load new comments which gate the
>
> I don't even know what you are talking about Or.
>
> v6 had some small problems in the logic and v7 introduces a fairly
> serious flaw while trying to fix them. IMHO, you are better to merge
> v6 than v7, at least v6's problems are less likely to be serious.

Jason, can you be more specific? I don't see any comments from you
expect for the cover-letter, so if something broke out, sure, a fix is
needed, but what is that?

> That is the same argument you used for the timestamp _ex UAPI mess,
> last cycle, where are the incremental fixes for that?

I remember you have provided review comment which pointed that the
time-stamping series stepped on something which was there before needs
some cleanup, not a real mess to my taste. Matan, do have the plan to
do that work?

Or.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]                         ` <be961828-39ff-4a15-863e-93cc86dfcf49-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
@ 2015-08-02  3:03                           ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2015-08-02  3:03 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford
  Cc: Or Gerlitz, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Sean Hefty, Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w, Haggai Eran, Linux Netdev List

> In many respects, I expect the ib_unregister_device() call to mirror the
>  error unwind found in the register call with the modifications for
>  dealing with a device that was actually live.

Yes, it should look like that, I also noticed there were ordering
problems in this area. and we probably have some long standing
use-after-free bugs in here which are hopefully harmless..

However, the kfrees should still be in release.

> To a certain extent, your comments in the v6 thread about the cache
> being expected to work beyond the life of the device and therefore part
> of the core stack are somewhat reasonable, but the life of the cache on
> a per device basis is no different than the life of the client
> registration.

Well, no, later patches in the series add calls into this code from
the drivers, and I looked into those and it wasn't obviously clear
that they were or could be made safe, or that future uses from drivers
would also be safe.

I looked at all of this, and thought about addressing the issue by
suggesting reordering removal, but it just doesn't guarantee safety,
requires too much auditing to prove and is a lot more work. Moving the
kfree is 10 lines or so and is guaranteed correct.

I disagree this is subtle or unexpected design pattern. The standard
kernel pattern with kref is a create/remove/release triplet. When you
have kref controlled structure every single member in that structure
needs to be analyzed for lifetime and the unwind(s) needs to be placed
in remove or release. For this cache code, that analysis tells me the
kfree unwind needs to live in release and the workqueue/event stuff in
remove. That is how I noticed to start with: a kfree of a member
inside a kref'd structure outside of release looks out of place,
especially if there is no locking protecting it...

The idiom is also to often use kref_put as part of the error unwind in
create, and since v7 doesn't do this it introduces a double kfree bug
as well..

At the end of the day this really shouldn't be the problem of the
caller, as long as the ib_device pointer is valid then core API should
be callable without causing a use-after-free.

> These branches are from my internal tree and really are WIP branches.
> I've pushed those WIP branches to github.  As I stated when I set my two

Sure testing sounds reasonable, maybe you need a different phrase for
this situation than 'thanks applied'.

Anyhow, it is a holiday here till Tuesday, hopefully Matan can fix
this up for your testing before then.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]                     ` <CAJ3xEMjgvgvc8WyUhHagiH4gNs6KK7pL_TmhXCywNiWEy4p70w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-02  7:56                       ` Matan Barak
       [not found]                         ` <55BDCD36.6000301-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Matan Barak @ 2015-08-02  7:56 UTC (permalink / raw)
  To: Or Gerlitz, Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty, Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List



On 8/2/2015 12:48 AM, Or Gerlitz wrote:
> On Sat, Aug 1, 2015 at 1:01 AM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> On Sat, Aug 01, 2015 at 12:24:23AM +0300, Or Gerlitz wrote:
>>
>>> addressed in incremental patch, as Doug suggested. Jason, it's wrong
>>> to send developers again and again to fix things which were
>>> perfect in Vn-1 but also not being covered by reviewers on Vn-1, at
>>> some point the reviewer can't load new comments which gate the
>>
>> I don't even know what you are talking about Or.
>>
>> v6 had some small problems in the logic and v7 introduces a fairly
>> serious flaw while trying to fix them. IMHO, you are better to merge
>> v6 than v7, at least v6's problems are less likely to be serious.
>
> Jason, can you be more specific? I don't see any comments from you
> expect for the cover-letter, so if something broke out, sure, a fix is
> needed, but what is that?
>
>> That is the same argument you used for the timestamp _ex UAPI mess,
>> last cycle, where are the incremental fixes for that?
>
> I remember you have provided review comment which pointed that the
> time-stamping series stepped on something which was there before needs
> some cleanup, not a real mess to my taste. Matan, do have the plan to
> do that work?

Indeed this design flaw was introduced when the first legacy verb was
extended. I think that falling back from extended code to legacy code
should be in the uverbs code. ib_uverbs_write will return -ENOSYS only
if both extended and non-extended don't exist. The uverbs command itself 
will call the non-extended form if the comp_mask is zero and all
data between legacy size and the given size are zero as well.
What do you think?

>
> Or.
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core
       [not found]                         ` <55BDCD36.6000301-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-06  4:53                           ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2015-08-06  4:53 UTC (permalink / raw)
  To: Matan Barak
  Cc: Or Gerlitz, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Somnath Kotur, Moni Shoua,
	talal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Haggai Eran,
	Linux Netdev List

On Sun, Aug 02, 2015 at 10:56:38AM +0300, Matan Barak wrote:

> Indeed this design flaw was introduced when the first legacy verb was
> extended. I think that falling back from extended code to legacy code
> should be in the uverbs code. ib_uverbs_write will return -ENOSYS only
> if both extended and non-extended don't exist. The uverbs command itself
> will call the non-extended form if the comp_mask is zero and all
> data between legacy size and the given size are zero as well.
> What do you think?

Yes, that is what I had expected from day 1.. Didn't notice it in the
first introduction.

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-08-06  4:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-30 15:33 [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Matan Barak
2015-07-30 15:33 ` [PATCH for-next V7 01/10] net/ipv6: Export addrconf_ifid_eui48 Matan Barak
2015-07-30 15:33 ` [PATCH for-next V7 02/10] net: Add info for NETDEV_CHANGEUPPER event Matan Barak
     [not found] ` <1438270411-17648-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-30 15:33   ` [PATCH for-next V7 03/10] net/bonding: Export bond_option_active_slave_get_rcu Matan Barak
2015-07-31  9:40 ` [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core Or Gerlitz
     [not found]   ` <CAJ3xEMh9z=LG7dW0nBP8Zdrbf0SkZ_GCxjCoowWXuw2dueN+xg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-31 12:50     ` Doug Ledford
     [not found]       ` <55BB6F10.5030408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-07-31 16:32         ` Jason Gunthorpe
     [not found]           ` <20150731163201.GA6027-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-31 17:41             ` Doug Ledford
     [not found]               ` <55BBB353.7020806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-07-31 20:18                 ` Jason Gunthorpe
     [not found]                   ` <20150731201828.GA28263-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-08-01  0:30                     ` Doug Ledford
     [not found]                       ` <be961828-39ff-4a15-863e-93cc86dfcf49@email.android.com>
     [not found]                         ` <be961828-39ff-4a15-863e-93cc86dfcf49-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2015-08-02  3:03                           ` Jason Gunthorpe
2015-07-31 21:24               ` Or Gerlitz
2015-07-31 22:01                 ` Jason Gunthorpe
2015-08-01 21:48                   ` Or Gerlitz
     [not found]                     ` <CAJ3xEMjgvgvc8WyUhHagiH4gNs6KK7pL_TmhXCywNiWEy4p70w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-02  7:56                       ` Matan Barak
     [not found]                         ` <55BDCD36.6000301-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-06  4:53                           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).