Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH rdma-next v1 0/2] Let IB core distribute cache update events
@ 2019-10-29 11:53 Leon Romanovsky
  2019-10-29 11:53 ` [PATCH rdma-next v1 1/2] IB/core: " Leon Romanovsky
  2019-10-29 11:53 ` [PATCH rdma-next v1 2/2] IB/core: Cut down single member ib_cache structure Leon Romanovsky
  0 siblings, 2 replies; 12+ messages in thread
From: Leon Romanovsky @ 2019-10-29 11:53 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
v0->v1: https://lore.kernel.org/linux-rdma/20191020065427.8772-1-leon@kernel.org
 - Instead of conditionally notifying event in wq, always use WQ
   This will allow to simplify the IB core and ULPs to get rid of
   additional work.
 - Updated kdoc comment to reflect the notifier execution context.

--------------------------------------------------------------------------------------
From Parav,

Currently when low level driver notifies Pkey, GID, port change
events, they are notified to the registered handlers in the order
they are registered.

IB core and other ULPs such as IPoIB are interested in GID, LID,
Pkey change events. Since all GID query done by ULPs is serviced
by IB core, IB core is yet to update the GID cache when IPoIB
queries the GID, resulting into not updating IPoIB address.
Detailed flow is shown in the patch-1.

Hence, all events which require cache update are handled first by
the IB core. Once cache update work is completed, IB core distributes
the event to subscriber clients.

This is tested with opensm's /etc/rdma/opensm.conf subnet_prefix
configuration update where before the update

$ ip link show dev ib0
ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:01:07:fe:80:00:00:00:00:00:00:24:8a:07:03:00:b3:d1:12 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

And after the subnet prefix update:

ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:01:07:fe:80:00:00:00:00:00:02:24:8a:07:03:00:b3:d1:12 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff


Parav Pandit (2):
  IB/core: Let IB core distribute cache update events
  IB/core: Cut down single member ib_cache structure

 drivers/infiniband/core/cache.c     | 114 +++++++++++++---------------
 drivers/infiniband/core/core_priv.h |   2 +
 drivers/infiniband/core/device.c    |  27 ++++---
 include/rdma/ib_verbs.h             |   8 +-
 4 files changed, 74 insertions(+), 77 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 11:53 [PATCH rdma-next v1 0/2] Let IB core distribute cache update events Leon Romanovsky
@ 2019-10-29 11:53 ` Leon Romanovsky
  2019-10-29 13:31   ` Parav Pandit
  2019-10-29 15:24   ` Jason Gunthorpe
  2019-10-29 11:53 ` [PATCH rdma-next v1 2/2] IB/core: Cut down single member ib_cache structure Leon Romanovsky
  1 sibling, 2 replies; 12+ messages in thread
From: Leon Romanovsky @ 2019-10-29 11:53 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Parav Pandit <parav@mellanox.com>

Currently when low level driver notifies Pkey, GID, port change events,
they are notified to the registered handlers in the order they are
registered.
IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.
Since all GID query done by ULPs is serviced by IB core, in below flow
when GID change event occurs, IB core is yet to update the GID cache
when IPoIB queries the GID, resulting into not updating IPoIB address.

mlx5_ib_handle_event()
  ib_dispatch_event()
    ib_cache_event()
       queue_work() -> slow cache update

    [..]
    ipoib_event()
     queue_work()
       [..]
       work handler
         ipoib_ib_dev_flush_light()
           __ipoib_ib_dev_flush()
              ipoib_dev_addr_changed_valid()
                rdma_query_gid() <- Returns old GID, cache not updated.

Hence, all events which require cache update are handled first by the IB
core. Once cache update work is completed, IB core distributes the event
to subscriber clients.

Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cache.c     | 84 ++++++++++++++---------------
 drivers/infiniband/core/core_priv.h |  2 +
 drivers/infiniband/core/device.c    | 27 ++++++----
 include/rdma/ib_verbs.h             |  1 -
 4 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index d535995711c3..1d5c0df7dfab 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -51,9 +51,8 @@ struct ib_pkey_cache {

 struct ib_update_work {
 	struct work_struct work;
-	struct ib_device  *device;
-	u8                 port_num;
-	bool		   enforce_security;
+	struct ib_event event;
+	bool enforce_security;
 };

 union ib_gid zgid;
@@ -130,7 +129,7 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 	event.element.port_num	= port;
 	event.event		= IB_EVENT_GID_CHANGE;

-	ib_dispatch_event(&event);
+	ib_dispatch_cache_event_clients(&event);
 }

 static const char * const gid_type_str[] = {
@@ -1381,9 +1380,8 @@ static int config_non_roce_gid_cache(struct ib_device *device,
 	return ret;
 }

-static void ib_cache_update(struct ib_device *device,
-			    u8                port,
-			    bool	      enforce_security)
+static int
+ib_cache_update(struct ib_device *device, u8 port, bool	enforce_security)
 {
 	struct ib_port_attr       *tprops = NULL;
 	struct ib_pkey_cache      *pkey_cache = NULL, *old_pkey_cache;
@@ -1391,11 +1389,11 @@ static void ib_cache_update(struct ib_device *device,
 	int                        ret;

 	if (!rdma_is_port_valid(device, port))
-		return;
+		return -EINVAL;

 	tprops = kmalloc(sizeof *tprops, GFP_KERNEL);
 	if (!tprops)
-		return;
+		return -ENOMEM;

 	ret = ib_query_port(device, port, tprops);
 	if (ret) {
@@ -1413,8 +1411,10 @@ static void ib_cache_update(struct ib_device *device,
 	pkey_cache = kmalloc(struct_size(pkey_cache, table,
 					 tprops->pkey_tbl_len),
 			     GFP_KERNEL);
-	if (!pkey_cache)
+	if (!pkey_cache) {
+		ret = -ENOMEM;
 		goto err;
+	}

 	pkey_cache->table_len = tprops->pkey_tbl_len;

@@ -1446,49 +1446,48 @@ static void ib_cache_update(struct ib_device *device,

 	kfree(old_pkey_cache);
 	kfree(tprops);
-	return;
+	return 0;

 err:
 	kfree(pkey_cache);
 	kfree(tprops);
+	return ret;
 }

 static void ib_cache_task(struct work_struct *_work)
 {
 	struct ib_update_work *work =
 		container_of(_work, struct ib_update_work, work);
+	int ret;
+
+	ret = ib_cache_update(work->event.device, work->event.element.port_num,
+			      work->enforce_security);
+
+	/* GID event is notified already for individual GID entries by
+	 * dispatch_gid_change_event(). Hence, notifiy for rest of the
+	 * events.
+	 */
+	if (!ret && work->event.event != IB_EVENT_GID_CHANGE)
+		ib_dispatch_cache_event_clients(&work->event);

-	ib_cache_update(work->device,
-			work->port_num,
-			work->enforce_security);
 	kfree(work);
 }

-static void ib_cache_event(struct ib_event_handler *handler,
-			   struct ib_event *event)
+void ib_enqueue_cache_update_event(const struct ib_event *event)
 {
 	struct ib_update_work *work;

-	if (event->event == IB_EVENT_PORT_ERR    ||
-	    event->event == IB_EVENT_PORT_ACTIVE ||
-	    event->event == IB_EVENT_LID_CHANGE  ||
-	    event->event == IB_EVENT_PKEY_CHANGE ||
-	    event->event == IB_EVENT_CLIENT_REREGISTER ||
-	    event->event == IB_EVENT_GID_CHANGE) {
-		work = kmalloc(sizeof *work, GFP_ATOMIC);
-		if (work) {
-			INIT_WORK(&work->work, ib_cache_task);
-			work->device   = event->device;
-			work->port_num = event->element.port_num;
-			if (event->event == IB_EVENT_PKEY_CHANGE ||
-			    event->event == IB_EVENT_GID_CHANGE)
-				work->enforce_security = true;
-			else
-				work->enforce_security = false;
-
-			queue_work(ib_wq, &work->work);
-		}
-	}
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return;
+
+	INIT_WORK(&work->work, ib_cache_task);
+	work->event = *event;
+	if (event->event == IB_EVENT_PKEY_CHANGE ||
+	    event->event == IB_EVENT_GID_CHANGE)
+		work->enforce_security = true;
+
+	queue_work(ib_wq, &work->work);
 }

 int ib_cache_setup_one(struct ib_device *device)
@@ -1505,9 +1504,6 @@ int ib_cache_setup_one(struct ib_device *device)
 	rdma_for_each_port (device, p)
 		ib_cache_update(device, p, true);

-	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
-			      device, ib_cache_event);
-	ib_register_event_handler(&device->cache.event_handler);
 	return 0;
 }

@@ -1529,14 +1525,12 @@ void ib_cache_release_one(struct ib_device *device)

 void ib_cache_cleanup_one(struct ib_device *device)
 {
-	/* The cleanup function unregisters the event handler,
-	 * waits for all in-progress workqueue elements and cleans
-	 * up the GID cache. This function should be called after
-	 * the device was removed from the devices list and all
-	 * clients were removed, so the cache exists but is
+	/* The cleanup function waits for all in-progress workqueue
+	 * elements and cleans up the GID cache. This function should be
+	 * called after the device was removed from the devices list and
+	 * all clients were removed, so the cache exists but is
 	 * non-functional and shouldn't be updated anymore.
 	 */
-	ib_unregister_event_handler(&device->cache.event_handler);
 	flush_workqueue(ib_wq);
 	gid_table_cleanup_one(device);

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3a8b0911c3bc..137c98098489 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -149,6 +149,8 @@ unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);
+void ib_enqueue_cache_update_event(const struct ib_event *event);
+void ib_dispatch_cache_event_clients(struct ib_event *event);

 #ifdef CONFIG_CGROUP_RDMA
 void ib_device_register_rdmacg(struct ib_device *device);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index f8d383ceae05..9db229c628e9 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1931,8 +1931,8 @@ EXPORT_SYMBOL(ib_set_client_data);
  *
  * ib_register_event_handler() registers an event handler that will be
  * called back when asynchronous IB events occur (as defined in
- * chapter 11 of the InfiniBand Architecture Specification).  This
- * callback may occur in interrupt context.
+ * chapter 11 of the InfiniBand Architecture Specification). This
+ * callback always occurring in workqueue context.
  */
 void ib_register_event_handler(struct ib_event_handler *event_handler)
 {
@@ -1962,15 +1962,7 @@ void ib_unregister_event_handler(struct ib_event_handler *event_handler)
 }
 EXPORT_SYMBOL(ib_unregister_event_handler);

-/**
- * ib_dispatch_event - Dispatch an asynchronous event
- * @event:Event to dispatch
- *
- * Low-level drivers must call ib_dispatch_event() to dispatch the
- * event to all registered event handlers when an asynchronous event
- * occurs.
- */
-void ib_dispatch_event(struct ib_event *event)
+void ib_dispatch_cache_event_clients(struct ib_event *event)
 {
 	unsigned long flags;
 	struct ib_event_handler *handler;
@@ -1982,6 +1974,19 @@ void ib_dispatch_event(struct ib_event *event)

 	spin_unlock_irqrestore(&event->device->event_handler_lock, flags);
 }
+
+/**
+ * ib_dispatch_event - Dispatch an asynchronous event
+ * @event:Event to dispatch
+ *
+ * Low-level drivers must call ib_dispatch_event() to dispatch the
+ * event to all registered event handlers when an asynchronous event
+ * occurs.
+ */
+void ib_dispatch_event(struct ib_event *event)
+{
+	ib_enqueue_cache_update_event(event);
+}
 EXPORT_SYMBOL(ib_dispatch_event);

 static int iw_query_port(struct ib_device *device,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0626b62ed107..6604115f7c85 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2148,7 +2148,6 @@ struct ib_port_cache {

 struct ib_cache {
 	rwlock_t                lock;
-	struct ib_event_handler event_handler;
 };

 struct ib_port_immutable {
--
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-next v1 2/2] IB/core: Cut down single member ib_cache structure
  2019-10-29 11:53 [PATCH rdma-next v1 0/2] Let IB core distribute cache update events Leon Romanovsky
  2019-10-29 11:53 ` [PATCH rdma-next v1 1/2] IB/core: " Leon Romanovsky
@ 2019-10-29 11:53 ` Leon Romanovsky
  1 sibling, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2019-10-29 11:53 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Parav Pandit <parav@mellanox.com>

Given that ib_cache structure has only single member now, merge the
cache lock directly in the ib_device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cache.c | 30 +++++++++++++++---------------
 include/rdma/ib_verbs.h         |  7 ++-----
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 1d5c0df7dfab..1d6f1a20194d 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1033,7 +1033,7 @@ int ib_get_cached_pkey(struct ib_device *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 
 	cache = device->port_data[port_num].cache.pkey;
 
@@ -1042,7 +1042,7 @@ int ib_get_cached_pkey(struct ib_device *device,
 	else
 		*pkey = cache->table[index];
 
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return ret;
 }
@@ -1057,9 +1057,9 @@ int ib_get_cached_subnet_prefix(struct ib_device *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 	*sn_pfx = device->port_data[port_num].cache.subnet_prefix;
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return 0;
 }
@@ -1079,7 +1079,7 @@ int ib_find_cached_pkey(struct ib_device *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 
 	cache = device->port_data[port_num].cache.pkey;
 
@@ -1100,7 +1100,7 @@ int ib_find_cached_pkey(struct ib_device *device,
 		ret = 0;
 	}
 
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return ret;
 }
@@ -1119,7 +1119,7 @@ int ib_find_exact_cached_pkey(struct ib_device *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 
 	cache = device->port_data[port_num].cache.pkey;
 
@@ -1132,7 +1132,7 @@ int ib_find_exact_cached_pkey(struct ib_device *device,
 			break;
 		}
 
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return ret;
 }
@@ -1148,9 +1148,9 @@ int ib_get_cached_lmc(struct ib_device *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 	*lmc = device->port_data[port_num].cache.lmc;
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return ret;
 }
@@ -1166,9 +1166,9 @@ int ib_get_cached_port_state(struct ib_device   *device,
 	if (!rdma_is_port_valid(device, port_num))
 		return -EINVAL;
 
-	read_lock_irqsave(&device->cache.lock, flags);
+	read_lock_irqsave(&device->cache_lock, flags);
 	*port_state = device->port_data[port_num].cache.port_state;
-	read_unlock_irqrestore(&device->cache.lock, flags);
+	read_unlock_irqrestore(&device->cache_lock, flags);
 
 	return ret;
 }
@@ -1428,7 +1428,7 @@ ib_cache_update(struct ib_device *device, u8 port, bool	enforce_security)
 		}
 	}
 
-	write_lock_irq(&device->cache.lock);
+	write_lock_irq(&device->cache_lock);
 
 	old_pkey_cache = device->port_data[port].cache.pkey;
 
@@ -1437,7 +1437,7 @@ ib_cache_update(struct ib_device *device, u8 port, bool	enforce_security)
 	device->port_data[port].cache.port_state = tprops->state;
 
 	device->port_data[port].cache.subnet_prefix = tprops->subnet_prefix;
-	write_unlock_irq(&device->cache.lock);
+	write_unlock_irq(&device->cache_lock);
 
 	if (enforce_security)
 		ib_security_cache_change(device,
@@ -1495,7 +1495,7 @@ int ib_cache_setup_one(struct ib_device *device)
 	unsigned int p;
 	int err;
 
-	rwlock_init(&device->cache.lock);
+	rwlock_init(&device->cache_lock);
 
 	err = gid_table_setup_one(device);
 	if (err)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6604115f7c85..d78abb616096 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2146,10 +2146,6 @@ struct ib_port_cache {
 	enum ib_port_state     port_state;
 };
 
-struct ib_cache {
-	rwlock_t                lock;
-};
-
 struct ib_port_immutable {
 	int                           pkey_tbl_len;
 	int                           gid_tbl_len;
@@ -2609,7 +2605,8 @@ struct ib_device {
 	struct xarray                 client_data;
 	struct mutex                  unregistration_lock;
 
-	struct ib_cache               cache;
+	/* Synchronize GID, Pkey cache entries, subnet prefix, LMC */
+	rwlock_t cache_lock;
 	/**
 	 * port_data is indexed by port number
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 11:53 ` [PATCH rdma-next v1 1/2] IB/core: " Leon Romanovsky
@ 2019-10-29 13:31   ` Parav Pandit
  2019-10-29 15:24   ` Jason Gunthorpe
  1 sibling, 0 replies; 12+ messages in thread
From: Parav Pandit @ 2019-10-29 13:31 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens



> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, October 29, 2019 6:53 AM
> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe
> <jgg@mellanox.com>
> Cc: Leon Romanovsky <leonro@mellanox.com>; RDMA mailing list <linux-
> rdma@vger.kernel.org>; Daniel Jurgens <danielj@mellanox.com>; Parav
> Pandit <parav@mellanox.com>
> Subject: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> update events
> 
> From: Parav Pandit <parav@mellanox.com>
> 
> Currently when low level driver notifies Pkey, GID, port change events, they
> are notified to the registered handlers in the order they are registered.
> IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey change
> events.
> Since all GID query done by ULPs is serviced by IB core, in below flow when
> GID change event occurs, IB core is yet to update the GID cache when IPoIB
> queries the GID, resulting into not updating IPoIB address.
> 
> mlx5_ib_handle_event()
>   ib_dispatch_event()
>     ib_cache_event()
>        queue_work() -> slow cache update
> 
>     [..]
>     ipoib_event()
>      queue_work()
>        [..]
>        work handler
>          ipoib_ib_dev_flush_light()
>            __ipoib_ib_dev_flush()
>               ipoib_dev_addr_changed_valid()
>                 rdma_query_gid() <- Returns old GID, cache not updated.
> 
> Hence, all events which require cache update are handled first by the IB
> core. Once cache update work is completed, IB core distributes the event to
> subscriber clients.
> 
> Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to
> cache")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/cache.c     | 84 ++++++++++++++---------------
>  drivers/infiniband/core/core_priv.h |  2 +
>  drivers/infiniband/core/device.c    | 27 ++++++----
>  include/rdma/ib_verbs.h             |  1 -
>  4 files changed, 57 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index d535995711c3..1d5c0df7dfab 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -51,9 +51,8 @@ struct ib_pkey_cache {
> 
>  struct ib_update_work {
>  	struct work_struct work;
> -	struct ib_device  *device;
> -	u8                 port_num;
> -	bool		   enforce_security;
> +	struct ib_event event;
> +	bool enforce_security;
>  };
> 
>  union ib_gid zgid;
> @@ -130,7 +129,7 @@ static void dispatch_gid_change_event(struct
> ib_device *ib_dev, u8 port)
>  	event.element.port_num	= port;
>  	event.event		= IB_EVENT_GID_CHANGE;
> 
> -	ib_dispatch_event(&event);
> +	ib_dispatch_cache_event_clients(&event);
>  }
> 
>  static const char * const gid_type_str[] = { @@ -1381,9 +1380,8 @@ static
> int config_non_roce_gid_cache(struct ib_device *device,
>  	return ret;
>  }
> 
> -static void ib_cache_update(struct ib_device *device,
> -			    u8                port,
> -			    bool	      enforce_security)
> +static int
> +ib_cache_update(struct ib_device *device, u8 port, bool
> 	enforce_security)
>  {
>  	struct ib_port_attr       *tprops = NULL;
>  	struct ib_pkey_cache      *pkey_cache = NULL, *old_pkey_cache;
> @@ -1391,11 +1389,11 @@ static void ib_cache_update(struct ib_device
> *device,
>  	int                        ret;
> 
>  	if (!rdma_is_port_valid(device, port))
> -		return;
> +		return -EINVAL;
> 
>  	tprops = kmalloc(sizeof *tprops, GFP_KERNEL);
>  	if (!tprops)
> -		return;
> +		return -ENOMEM;
> 
>  	ret = ib_query_port(device, port, tprops);
>  	if (ret) {
> @@ -1413,8 +1411,10 @@ static void ib_cache_update(struct ib_device
> *device,
>  	pkey_cache = kmalloc(struct_size(pkey_cache, table,
>  					 tprops->pkey_tbl_len),
>  			     GFP_KERNEL);
> -	if (!pkey_cache)
> +	if (!pkey_cache) {
> +		ret = -ENOMEM;
>  		goto err;
> +	}
> 
>  	pkey_cache->table_len = tprops->pkey_tbl_len;
> 
> @@ -1446,49 +1446,48 @@ static void ib_cache_update(struct ib_device
> *device,
> 
>  	kfree(old_pkey_cache);
>  	kfree(tprops);
> -	return;
> +	return 0;
> 
>  err:
>  	kfree(pkey_cache);
>  	kfree(tprops);
> +	return ret;
>  }
> 
>  static void ib_cache_task(struct work_struct *_work)  {
>  	struct ib_update_work *work =
>  		container_of(_work, struct ib_update_work, work);
> +	int ret;
> +
> +	ret = ib_cache_update(work->event.device, work-
> >event.element.port_num,
> +			      work->enforce_security);
> +
> +	/* GID event is notified already for individual GID entries by
> +	 * dispatch_gid_change_event(). Hence, notifiy for rest of the
> +	 * events.
> +	 */
> +	if (!ret && work->event.event != IB_EVENT_GID_CHANGE)
> +		ib_dispatch_cache_event_clients(&work->event);
> 
> -	ib_cache_update(work->device,
> -			work->port_num,
> -			work->enforce_security);
>  	kfree(work);
>  }
> 
> -static void ib_cache_event(struct ib_event_handler *handler,
> -			   struct ib_event *event)
> +void ib_enqueue_cache_update_event(const struct ib_event *event)
>  {
>  	struct ib_update_work *work;
> 
> -	if (event->event == IB_EVENT_PORT_ERR    ||
> -	    event->event == IB_EVENT_PORT_ACTIVE ||
> -	    event->event == IB_EVENT_LID_CHANGE  ||
> -	    event->event == IB_EVENT_PKEY_CHANGE ||
> -	    event->event == IB_EVENT_CLIENT_REREGISTER ||
> -	    event->event == IB_EVENT_GID_CHANGE) {
> -		work = kmalloc(sizeof *work, GFP_ATOMIC);
> -		if (work) {
> -			INIT_WORK(&work->work, ib_cache_task);
> -			work->device   = event->device;
> -			work->port_num = event->element.port_num;
> -			if (event->event == IB_EVENT_PKEY_CHANGE ||
> -			    event->event == IB_EVENT_GID_CHANGE)
> -				work->enforce_security = true;
> -			else
> -				work->enforce_security = false;
> -
> -			queue_work(ib_wq, &work->work);
> -		}
> -	}
> +	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> +	if (!work)
> +		return;
> +
> +	INIT_WORK(&work->work, ib_cache_task);
> +	work->event = *event;
> +	if (event->event == IB_EVENT_PKEY_CHANGE ||
> +	    event->event == IB_EVENT_GID_CHANGE)
> +		work->enforce_security = true;
> +
> +	queue_work(ib_wq, &work->work);
>  }
> 
>  int ib_cache_setup_one(struct ib_device *device) @@ -1505,9 +1504,6 @@
> int ib_cache_setup_one(struct ib_device *device)
>  	rdma_for_each_port (device, p)
>  		ib_cache_update(device, p, true);
> 
> -	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
> -			      device, ib_cache_event);
> -	ib_register_event_handler(&device->cache.event_handler);
>  	return 0;
>  }
> 
> @@ -1529,14 +1525,12 @@ void ib_cache_release_one(struct ib_device
> *device)
> 
>  void ib_cache_cleanup_one(struct ib_device *device)  {
> -	/* The cleanup function unregisters the event handler,
> -	 * waits for all in-progress workqueue elements and cleans
> -	 * up the GID cache. This function should be called after
> -	 * the device was removed from the devices list and all
> -	 * clients were removed, so the cache exists but is
> +	/* The cleanup function waits for all in-progress workqueue
> +	 * elements and cleans up the GID cache. This function should be
> +	 * called after the device was removed from the devices list and
> +	 * all clients were removed, so the cache exists but is
>  	 * non-functional and shouldn't be updated anymore.
>  	 */
> -	ib_unregister_event_handler(&device->cache.event_handler);
>  	flush_workqueue(ib_wq);
>  	gid_table_cleanup_one(device);
> 
> diff --git a/drivers/infiniband/core/core_priv.h
> b/drivers/infiniband/core/core_priv.h
> index 3a8b0911c3bc..137c98098489 100644
> --- a/drivers/infiniband/core/core_priv.h
> +++ b/drivers/infiniband/core/core_priv.h
> @@ -149,6 +149,8 @@ unsigned long roce_gid_type_mask_support(struct
> ib_device *ib_dev, u8 port);  int ib_cache_setup_one(struct ib_device
> *device);  void ib_cache_cleanup_one(struct ib_device *device);  void
> ib_cache_release_one(struct ib_device *device);
> +void ib_enqueue_cache_update_event(const struct ib_event *event); void
> +ib_dispatch_cache_event_clients(struct ib_event *event);
> 
>  #ifdef CONFIG_CGROUP_RDMA
>  void ib_device_register_rdmacg(struct ib_device *device); diff --git
> a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index f8d383ceae05..9db229c628e9 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1931,8 +1931,8 @@ EXPORT_SYMBOL(ib_set_client_data);
>   *
>   * ib_register_event_handler() registers an event handler that will be
>   * called back when asynchronous IB events occur (as defined in
> - * chapter 11 of the InfiniBand Architecture Specification).  This
> - * callback may occur in interrupt context.
> + * chapter 11 of the InfiniBand Architecture Specification). This
> + * callback always occurring in workqueue context.
>   */
>  void ib_register_event_handler(struct ib_event_handler *event_handler)  {
> @@ -1962,15 +1962,7 @@ void ib_unregister_event_handler(struct
> ib_event_handler *event_handler)  }
> EXPORT_SYMBOL(ib_unregister_event_handler);
> 
> -/**
> - * ib_dispatch_event - Dispatch an asynchronous event
> - * @event:Event to dispatch
> - *
> - * Low-level drivers must call ib_dispatch_event() to dispatch the
> - * event to all registered event handlers when an asynchronous event
> - * occurs.
> - */
> -void ib_dispatch_event(struct ib_event *event)
> +void ib_dispatch_cache_event_clients(struct ib_event *event)
>  {
>  	unsigned long flags;
>  	struct ib_event_handler *handler;
> @@ -1982,6 +1974,19 @@ void ib_dispatch_event(struct ib_event *event)
> 
>  	spin_unlock_irqrestore(&event->device->event_handler_lock, flags);
> }
> +
> +/**
> + * ib_dispatch_event - Dispatch an asynchronous event
> + * @event:Event to dispatch
> + *
> + * Low-level drivers must call ib_dispatch_event() to dispatch the
> + * event to all registered event handlers when an asynchronous event
> + * occurs.
> + */
> +void ib_dispatch_event(struct ib_event *event) {
> +	ib_enqueue_cache_update_event(event);
> +}
>  EXPORT_SYMBOL(ib_dispatch_event);
> 
>  static int iw_query_port(struct ib_device *device,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 0626b62ed107..6604115f7c85 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -2148,7 +2148,6 @@ struct ib_port_cache {
> 
>  struct ib_cache {
>  	rwlock_t                lock;
> -	struct ib_event_handler event_handler;
>  };
> 
>  struct ib_port_immutable {
> --
> 2.20.1

We noticed one Kazan report with this patch in additional QA tests. I am looking into it.
Please wait until we have more updates.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 11:53 ` [PATCH rdma-next v1 1/2] IB/core: " Leon Romanovsky
  2019-10-29 13:31   ` Parav Pandit
@ 2019-10-29 15:24   ` Jason Gunthorpe
  2019-10-29 15:32     ` Parav Pandit
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-10-29 15:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Parav Pandit

On Tue, Oct 29, 2019 at 01:53:26PM +0200, Leon Romanovsky wrote:
> -static void ib_cache_update(struct ib_device *device,
> -			    u8                port,
> -			    bool	      enforce_security)
> +static int
> +ib_cache_update(struct ib_device *device, u8 port, bool	enforce_security)
>  {

Formatting

> +/**
> + * ib_dispatch_event - Dispatch an asynchronous event
> + * @event:Event to dispatch
> + *
> + * Low-level drivers must call ib_dispatch_event() to dispatch the
> + * event to all registered event handlers when an asynchronous event
> + * occurs.
> + */
> +void ib_dispatch_event(struct ib_event *event)
> +{
> +	ib_enqueue_cache_update_event(event);
> +}
>  EXPORT_SYMBOL(ib_dispatch_event);

Why not just move this into cache.c?

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 15:24   ` Jason Gunthorpe
@ 2019-10-29 15:32     ` Parav Pandit
  2019-10-29 15:33       ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Parav Pandit @ 2019-10-29 15:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens



> -----Original Message-----
> From: Jason Gunthorpe <jgg@mellanox.com>
> Sent: Tuesday, October 29, 2019 10:24 AM
> To: Leon Romanovsky <leon@kernel.org>
> Cc: Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leonro@mellanox.com>; RDMA mailing list <linux-rdma@vger.kernel.org>;
> Daniel Jurgens <danielj@mellanox.com>; Parav Pandit <parav@mellanox.com>
> Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> update events
> 
> On Tue, Oct 29, 2019 at 01:53:26PM +0200, Leon Romanovsky wrote:
> > -static void ib_cache_update(struct ib_device *device,
> > -			    u8                port,
> > -			    bool	      enforce_security)
> > +static int
> > +ib_cache_update(struct ib_device *device, u8 port, bool
> 	enforce_security)
> >  {
> 
> Formatting
> 
Will fix it.

> > +/**
> > + * ib_dispatch_event - Dispatch an asynchronous event
> > + * @event:Event to dispatch
> > + *
> > + * Low-level drivers must call ib_dispatch_event() to dispatch the
> > + * event to all registered event handlers when an asynchronous event
> > + * occurs.
> > + */
> > +void ib_dispatch_event(struct ib_event *event) {
> > +	ib_enqueue_cache_update_event(event);
> > +}
> >  EXPORT_SYMBOL(ib_dispatch_event);
> 
> Why not just move this into cache.c?
> 
Same thought same to me when I had to add one liner call.
However the issue was device.c has the code for the event registration/unregistration and calling the handlers unrelated to cache.
So moving ib_dispatch_event() to cache.c looked incorrect to me.

> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 15:32     ` Parav Pandit
@ 2019-10-29 15:33       ` Jason Gunthorpe
  2019-10-29 15:47         ` Parav Pandit
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-10-29 15:33 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens

On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:

> > > +/**
> > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > + * @event:Event to dispatch
> > > + *
> > > + * Low-level drivers must call ib_dispatch_event() to dispatch the
> > > + * event to all registered event handlers when an asynchronous event
> > > + * occurs.
> > > + */
> > > +void ib_dispatch_event(struct ib_event *event) {
> > > +	ib_enqueue_cache_update_event(event);
> > > +}
> > >  EXPORT_SYMBOL(ib_dispatch_event);
> > 
> > Why not just move this into cache.c?
> > 
> Same thought same to me when I had to add one liner call.
> However the issue was device.c has the code for the event registration/unregistration and calling the handlers unrelated to cache.
> So moving ib_dispatch_event() to cache.c looked incorrect to me.

Well, maybe we can move the wq code from the cache.c into here? 

It looks just as incorrect to have the one line call

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 15:33       ` Jason Gunthorpe
@ 2019-10-29 15:47         ` Parav Pandit
  2019-10-29 15:54           ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Parav Pandit @ 2019-10-29 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens



> -----Original Message-----
> From: Jason Gunthorpe <jgg@mellanox.com>
> Sent: Tuesday, October 29, 2019 10:34 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>; RDMA
> mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> <danielj@mellanox.com>
> Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> update events
> 
> On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:
> 
> > > > +/**
> > > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > > + * @event:Event to dispatch
> > > > + *
> > > > + * Low-level drivers must call ib_dispatch_event() to dispatch the
> > > > + * event to all registered event handlers when an asynchronous event
> > > > + * occurs.
> > > > + */
> > > > +void ib_dispatch_event(struct ib_event *event) {
> > > > +	ib_enqueue_cache_update_event(event);
> > > > +}
> > > >  EXPORT_SYMBOL(ib_dispatch_event);
> > >
> > > Why not just move this into cache.c?
> > >
> > Same thought same to me when I had to add one liner call.
> > However the issue was device.c has the code for the event
> registration/unregistration and calling the handlers unrelated to cache.
> > So moving ib_dispatch_event() to cache.c looked incorrect to me.
> 
> Well, maybe we can move the wq code from the cache.c into here?
> 
I prefer to keep all cache specific code in cache.c because there is where we flush the ib_wq, queue work there.

> It looks just as incorrect to have the one line call
> 
No, its not incorrect. Because device.c provides functionality to register/unregister handler and dispatch event.
While cache layer deals with cache updates etc, and uses wq service provided device.c
If we rename ib_dispatch_event() to ib_dispatch_cache_event() it make sense to move to cache.c
However we have non cache specific events there too, so current code split between cache.c and device.c looks appropriate.

> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 15:47         ` Parav Pandit
@ 2019-10-29 15:54           ` Jason Gunthorpe
  2019-10-29 18:11             ` Parav Pandit
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-10-29 15:54 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens

On Tue, Oct 29, 2019 at 03:47:42PM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > Sent: Tuesday, October 29, 2019 10:34 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>; RDMA
> > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > <danielj@mellanox.com>
> > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> > update events
> > 
> > On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:
> > 
> > > > > +/**
> > > > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > > > + * @event:Event to dispatch
> > > > > + *
> > > > > + * Low-level drivers must call ib_dispatch_event() to dispatch the
> > > > > + * event to all registered event handlers when an asynchronous event
> > > > > + * occurs.
> > > > > + */
> > > > > +void ib_dispatch_event(struct ib_event *event) {
> > > > > +	ib_enqueue_cache_update_event(event);
> > > > > +}
> > > > >  EXPORT_SYMBOL(ib_dispatch_event);
> > > >
> > > > Why not just move this into cache.c?
> > > >
> > > Same thought same to me when I had to add one liner call.
> > > However the issue was device.c has the code for the event
> > registration/unregistration and calling the handlers unrelated to cache.
> > > So moving ib_dispatch_event() to cache.c looked incorrect to me.
> > 
> > Well, maybe we can move the wq code from the cache.c into here?
> 
> I prefer to keep all cache specific code in cache.c because there is
> where we flush the ib_wq, queue work there.

I think that is because the cache is not subscribing to a notifier, it
really should, then things order properly and the flush is not needed.

> > It looks just as incorrect to have the one line call
> 
> No, its not incorrect. Because device.c provides functionality to
> register/unregister handler and dispatch event.  While cache layer
> deals with cache updates etc, and uses wq service provided device.c
> If we rename ib_dispatch_event() to ib_dispatch_cache_event() it
> make sense to move to cache.c However we have non cache specific
> events there too, so current code split between cache.c and device.c
> looks appropriate.

It always looks bad to have a single line function call like that,
especially just for spurious reasons like file placement or
functioning naming. It shows something is being done wrong

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 15:54           ` Jason Gunthorpe
@ 2019-10-29 18:11             ` Parav Pandit
  2019-10-29 18:26               ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Parav Pandit @ 2019-10-29 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Tuesday, October 29, 2019 10:55 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>; RDMA
> mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> <danielj@mellanox.com>
> Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> update events
> 
> On Tue, Oct 29, 2019 at 03:47:42PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > Sent: Tuesday, October 29, 2019 10:34 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>;
> RDMA
> > > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > > <danielj@mellanox.com>
> > > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core
> > > distribute cache update events
> > >
> > > On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:
> > >
> > > > > > +/**
> > > > > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > > > > + * @event:Event to dispatch
> > > > > > + *
> > > > > > + * Low-level drivers must call ib_dispatch_event() to
> > > > > > +dispatch the
> > > > > > + * event to all registered event handlers when an
> > > > > > +asynchronous event
> > > > > > + * occurs.
> > > > > > + */
> > > > > > +void ib_dispatch_event(struct ib_event *event) {
> > > > > > +	ib_enqueue_cache_update_event(event);
> > > > > > +}
> > > > > >  EXPORT_SYMBOL(ib_dispatch_event);
> > > > >
> > > > > Why not just move this into cache.c?
> > > > >
> > > > Same thought same to me when I had to add one liner call.
> > > > However the issue was device.c has the code for the event
> > > registration/unregistration and calling the handlers unrelated to cache.
> > > > So moving ib_dispatch_event() to cache.c looked incorrect to me.
> > >
> > > Well, maybe we can move the wq code from the cache.c into here?
> >
> > I prefer to keep all cache specific code in cache.c because there is
> > where we flush the ib_wq, queue work there.
> 
> I think that is because the cache is not subscribing to a notifier, it really should,
> then things order properly and the flush is not needed.
> 
Cache shouldn't subscribe to the notifier, that is the race condition issue.
With this fix, cache invokes the notifier chain in wq context after all cache state is synced.
More below.

> > > It looks just as incorrect to have the one line call
> >
> > No, its not incorrect. Because device.c provides functionality to
> > register/unregister handler and dispatch event.  While cache layer
> > deals with cache updates etc, and uses wq service provided device.c If
> > we rename ib_dispatch_event() to ib_dispatch_cache_event() it make
> > sense to move to cache.c However we have non cache specific events
> > there too, so current code split between cache.c and device.c looks
> > appropriate.
> 
> It always looks bad to have a single line function call like that, especially just for
> spurious reasons like file placement or functioning naming. It shows something
> is being done wrong
> 
Actually this v1 version of the patch is incorrect.
Even though we don't have cache update event, patch calls the cache work.
My previous version was correct.

But improved version is below that addressed your comment.
I took care to get rid of one liner function too.
Inlined the patch below for review purpose. Will send through Leon's queue.
This fix exposed another dormant bug of mlx4 driver. So will send it together as pre-patch to this patch.

From 97c8a3050471e1d27fdd7f529784d1fa0c871d43 Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Thu, 26 Sep 2019 15:55:26 -0500
Subject: [PATCH] IB/core: Let IB core distribute cache update events

Currently when low level driver notifies Pkey, GID, port change events,
they are notified to the registered handlers in the order they are
registered.
IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey
change events.
Since all GID query done by ULPs is serviced by IB core, in below flow
when GID change event occurs, IB core is yet to update the GID cache
when IPoIB queries the GID, resulting into not updating IPoIB address.

mlx5_ib_handle_event()
  ib_dispatch_event()
    ib_cache_event()
       queue_work() -> slow cache update

    [..]
    ipoib_event()
     queue_work()
       [..]
       work handler
         ipoib_ib_dev_flush_light()
           __ipoib_ib_dev_flush()
              ipoib_dev_addr_changed_valid()
                rdma_query_gid() <- Returns old GID, cache not updated.

Hence, all events which require cache update are handled first by the IB
core. Once cache update work is completed, IB core distributes the event
to subscriber clients.

---
Changelog:
v1->v2:
 - Fixed not to call cache update when it is not a cache related event
 - Always invoke notifier handler in workqueue context
 - Removed incorrect comment about caller context of handler as it needs
   some more work
 - Replaced handler spin lock with rwsemaphore to make use of workqueue
   context
 - Moved ib_dispatch_event to cache.c as it distributes all events in wq
   managed mostly by cache.c

Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache")
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/infiniband/core/cache.c     | 121 +++++++++++++++++-----------
 drivers/infiniband/core/core_priv.h |   1 +
 drivers/infiniband/core/device.c    |  11 +--
 include/rdma/ib_verbs.h             |   3 +-
 4 files changed, 78 insertions(+), 58 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 00fb3eacda19..65b10efca2b8 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -51,9 +51,8 @@ struct ib_pkey_cache {
 
 struct ib_update_work {
 	struct work_struct work;
-	struct ib_device  *device;
-	u8                 port_num;
-	bool		   enforce_security;
+	struct ib_event event;
+	bool enforce_security;
 };
 
 union ib_gid zgid;
@@ -130,7 +129,7 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 	event.element.port_num	= port;
 	event.event		= IB_EVENT_GID_CHANGE;
 
-	ib_dispatch_event(&event);
+	ib_dispatch_event_clients(&event);
 }
 
 static const char * const gid_type_str[] = {
@@ -1387,9 +1386,8 @@ static int config_non_roce_gid_cache(struct ib_device *device,
 	return ret;
 }
 
-static void ib_cache_update(struct ib_device *device,
-			    u8                port,
-			    bool	      enforce_security)
+static int
+ib_cache_update(struct ib_device *device, u8 port, bool enforce_security)
 {
 	struct ib_port_attr       *tprops = NULL;
 	struct ib_pkey_cache      *pkey_cache = NULL, *old_pkey_cache;
@@ -1397,11 +1395,11 @@ static void ib_cache_update(struct ib_device *device,
 	int                        ret;
 
 	if (!rdma_is_port_valid(device, port))
-		return;
+		return -EINVAL;
 
 	tprops = kmalloc(sizeof *tprops, GFP_KERNEL);
 	if (!tprops)
-		return;
+		return -ENOMEM;
 
 	ret = ib_query_port(device, port, tprops);
 	if (ret) {
@@ -1419,8 +1417,10 @@ static void ib_cache_update(struct ib_device *device,
 	pkey_cache = kmalloc(struct_size(pkey_cache, table,
 					 tprops->pkey_tbl_len),
 			     GFP_KERNEL);
-	if (!pkey_cache)
+	if (!pkey_cache) {
+		ret = -ENOMEM;
 		goto err;
+	}
 
 	pkey_cache->table_len = tprops->pkey_tbl_len;
 
@@ -1452,50 +1452,84 @@ static void ib_cache_update(struct ib_device *device,
 
 	kfree(old_pkey_cache);
 	kfree(tprops);
-	return;
+	return 0;
 
 err:
 	kfree(pkey_cache);
 	kfree(tprops);
+	return ret;
+}
+
+static void ib_cache_event_task(struct work_struct *_work)
+{
+	struct ib_update_work *work =
+		container_of(_work, struct ib_update_work, work);
+	int ret;
+
+	/* Before distributing the cache update event, first sync
+	 * the cache.
+	 */
+	ret = ib_cache_update(work->event.device, work->event.element.port_num,
+			      work->enforce_security);
+
+	/* GID event is notified already for individual GID entries by
+	 * dispatch_gid_change_event(). Hence, notifiy for rest of the
+	 * events.
+	 */
+	if (!ret && work->event.event != IB_EVENT_GID_CHANGE)
+		ib_dispatch_event_clients(&work->event);
+
+	kfree(work);
 }
 
-static void ib_cache_task(struct work_struct *_work)
+static void ib_generic_event_task(struct work_struct *_work)
 {
 	struct ib_update_work *work =
 		container_of(_work, struct ib_update_work, work);
 
-	ib_cache_update(work->device,
-			work->port_num,
-			work->enforce_security);
+	ib_dispatch_event_clients(&work->event);
 	kfree(work);
 }
 
-static void ib_cache_event(struct ib_event_handler *handler,
-			   struct ib_event *event)
+static bool is_cache_update_event(const struct ib_event *event)
+{
+	return (event->event == IB_EVENT_PORT_ERR    ||
+		event->event == IB_EVENT_PORT_ACTIVE ||
+		event->event == IB_EVENT_LID_CHANGE  ||
+		event->event == IB_EVENT_PKEY_CHANGE ||
+		event->event == IB_EVENT_CLIENT_REREGISTER ||
+		event->event == IB_EVENT_GID_CHANGE);
+}
+
+/**
+ * ib_dispatch_event - Dispatch an asynchronous event
+ * @event:Event to dispatch
+ *
+ * Low-level drivers must call ib_dispatch_event() to dispatch the
+ * event to all registered event handlers when an asynchronous event
+ * occurs.
+ */
+void ib_dispatch_event(const struct ib_event *event)
 {
 	struct ib_update_work *work;
 
-	if (event->event == IB_EVENT_PORT_ERR    ||
-	    event->event == IB_EVENT_PORT_ACTIVE ||
-	    event->event == IB_EVENT_LID_CHANGE  ||
-	    event->event == IB_EVENT_PKEY_CHANGE ||
-	    event->event == IB_EVENT_CLIENT_REREGISTER ||
-	    event->event == IB_EVENT_GID_CHANGE) {
-		work = kmalloc(sizeof *work, GFP_ATOMIC);
-		if (work) {
-			INIT_WORK(&work->work, ib_cache_task);
-			work->device   = event->device;
-			work->port_num = event->element.port_num;
-			if (event->event == IB_EVENT_PKEY_CHANGE ||
-			    event->event == IB_EVENT_GID_CHANGE)
-				work->enforce_security = true;
-			else
-				work->enforce_security = false;
-
-			queue_work(ib_wq, &work->work);
-		}
-	}
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return;
+
+	if (is_cache_update_event(event))
+		INIT_WORK(&work->work, ib_cache_event_task);
+	else
+		INIT_WORK(&work->work, ib_generic_event_task);
+
+	work->event = *event;
+	if (event->event == IB_EVENT_PKEY_CHANGE ||
+	    event->event == IB_EVENT_GID_CHANGE)
+		work->enforce_security = true;
+
+	queue_work(ib_wq, &work->work);
 }
+EXPORT_SYMBOL(ib_dispatch_event);
 
 int ib_cache_setup_one(struct ib_device *device)
 {
@@ -1511,9 +1545,6 @@ int ib_cache_setup_one(struct ib_device *device)
 	rdma_for_each_port (device, p)
 		ib_cache_update(device, p, true);
 
-	INIT_IB_EVENT_HANDLER(&device->cache.event_handler,
-			      device, ib_cache_event);
-	ib_register_event_handler(&device->cache.event_handler);
 	return 0;
 }
 
@@ -1535,14 +1566,12 @@ void ib_cache_release_one(struct ib_device *device)
 
 void ib_cache_cleanup_one(struct ib_device *device)
 {
-	/* The cleanup function unregisters the event handler,
-	 * waits for all in-progress workqueue elements and cleans
-	 * up the GID cache. This function should be called after
-	 * the device was removed from the devices list and all
-	 * clients were removed, so the cache exists but is
+	/* The cleanup function waits for all in-progress workqueue
+	 * elements and cleans up the GID cache. This function should be
+	 * called after the device was removed from the devices list and
+	 * all clients were removed, so the cache exists but is
 	 * non-functional and shouldn't be updated anymore.
 	 */
-	ib_unregister_event_handler(&device->cache.event_handler);
 	flush_workqueue(ib_wq);
 	gid_table_cleanup_one(device);
 
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3a8b0911c3bc..362b0231f7a1 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -149,6 +149,7 @@ unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);
+void ib_dispatch_event_clients(struct ib_event *event);
 
 #ifdef CONFIG_CGROUP_RDMA
 void ib_device_register_rdmacg(struct ib_device *device);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a667636f74bf..3524fa73a3ea 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1951,15 +1951,7 @@ void ib_unregister_event_handler(struct ib_event_handler *event_handler)
 }
 EXPORT_SYMBOL(ib_unregister_event_handler);
 
-/**
- * ib_dispatch_event - Dispatch an asynchronous event
- * @event:Event to dispatch
- *
- * Low-level drivers must call ib_dispatch_event() to dispatch the
- * event to all registered event handlers when an asynchronous event
- * occurs.
- */
-void ib_dispatch_event(struct ib_event *event)
+void ib_dispatch_event_clients(struct ib_event *event)
 {
 	unsigned long flags;
 	struct ib_event_handler *handler;
@@ -1971,7 +1963,6 @@ void ib_dispatch_event(struct ib_event *event)
 
 	spin_unlock_irqrestore(&event->device->event_handler_lock, flags);
 }
-EXPORT_SYMBOL(ib_dispatch_event);
 
 static int iw_query_port(struct ib_device *device,
 			   u8 port_num,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6a47ba85c54c..586da67ff0f5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2146,7 +2146,6 @@ struct ib_port_cache {
 
 struct ib_cache {
 	rwlock_t                lock;
-	struct ib_event_handler event_handler;
 };
 
 struct ib_port_immutable {
@@ -2897,7 +2896,7 @@ bool ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
 
 void ib_register_event_handler(struct ib_event_handler *event_handler);
 void ib_unregister_event_handler(struct ib_event_handler *event_handler);
-void ib_dispatch_event(struct ib_event *event);
+void ib_dispatch_event(const struct ib_event *event);
 
 int ib_query_port(struct ib_device *device,
 		  u8 port_num, struct ib_port_attr *port_attr);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 18:11             ` Parav Pandit
@ 2019-10-29 18:26               ` Jason Gunthorpe
  2019-10-29 19:28                 ` Parav Pandit
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2019-10-29 18:26 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens

On Tue, Oct 29, 2019 at 06:11:01PM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Tuesday, October 29, 2019 10:55 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>; RDMA
> > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > <danielj@mellanox.com>
> > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> > update events
> > 
> > On Tue, Oct 29, 2019 at 03:47:42PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > Sent: Tuesday, October 29, 2019 10:34 AM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > > > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>;
> > RDMA
> > > > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > > > <danielj@mellanox.com>
> > > > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core
> > > > distribute cache update events
> > > >
> > > > On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:
> > > >
> > > > > > > +/**
> > > > > > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > > > > > + * @event:Event to dispatch
> > > > > > > + *
> > > > > > > + * Low-level drivers must call ib_dispatch_event() to
> > > > > > > +dispatch the
> > > > > > > + * event to all registered event handlers when an
> > > > > > > +asynchronous event
> > > > > > > + * occurs.
> > > > > > > + */
> > > > > > > +void ib_dispatch_event(struct ib_event *event) {
> > > > > > > +	ib_enqueue_cache_update_event(event);
> > > > > > > +}
> > > > > > >  EXPORT_SYMBOL(ib_dispatch_event);
> > > > > >
> > > > > > Why not just move this into cache.c?
> > > > > >
> > > > > Same thought same to me when I had to add one liner call.
> > > > > However the issue was device.c has the code for the event
> > > > registration/unregistration and calling the handlers unrelated to cache.
> > > > > So moving ib_dispatch_event() to cache.c looked incorrect to me.
> > > >
> > > > Well, maybe we can move the wq code from the cache.c into here?
> > >
> > > I prefer to keep all cache specific code in cache.c because there is
> > > where we flush the ib_wq, queue work there.
> > 
> > I think that is because the cache is not subscribing to a notifier, it really should,
> > then things order properly and the flush is not needed.
> > 
> Cache shouldn't subscribe to the notifier, that is the race
> condition issue.

Notifiers are ordered, as long as the cache subscribes first to the
notifier list it is not a 'race condition'.

This is still somewhat problematic because 'ib_dispatch_event' still
calls the handlers under a spinlock (ie the handlers are still
non-blocking contexts)

I'm suggesting to swap it for a normal blocking notifier chain and
guarentee the cache is the first entry in the chain.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache update events
  2019-10-29 18:26               ` Jason Gunthorpe
@ 2019-10-29 19:28                 ` Parav Pandit
  0 siblings, 0 replies; 12+ messages in thread
From: Parav Pandit @ 2019-10-29 19:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Daniel Jurgens



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Tuesday, October 29, 2019 1:26 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>; RDMA
> mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> <danielj@mellanox.com>
> Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core distribute cache
> update events
> 
> On Tue, Oct 29, 2019 at 06:11:01PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Tuesday, October 29, 2019 10:55 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>;
> RDMA
> > > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > > <danielj@mellanox.com>
> > > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core
> > > distribute cache update events
> > >
> > > On Tue, Oct 29, 2019 at 03:47:42PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > > Sent: Tuesday, October 29, 2019 10:34 AM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Leon Romanovsky <leon@kernel.org>; Doug Ledford
> > > > > <dledford@redhat.com>; Leon Romanovsky <leonro@mellanox.com>;
> > > RDMA
> > > > > mailing list <linux-rdma@vger.kernel.org>; Daniel Jurgens
> > > > > <danielj@mellanox.com>
> > > > > Subject: Re: [PATCH rdma-next v1 1/2] IB/core: Let IB core
> > > > > distribute cache update events
> > > > >
> > > > > On Tue, Oct 29, 2019 at 03:32:46PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > > > +/**
> > > > > > > > + * ib_dispatch_event - Dispatch an asynchronous event
> > > > > > > > + * @event:Event to dispatch
> > > > > > > > + *
> > > > > > > > + * Low-level drivers must call ib_dispatch_event() to
> > > > > > > > +dispatch the
> > > > > > > > + * event to all registered event handlers when an
> > > > > > > > +asynchronous event
> > > > > > > > + * occurs.
> > > > > > > > + */
> > > > > > > > +void ib_dispatch_event(struct ib_event *event) {
> > > > > > > > +	ib_enqueue_cache_update_event(event);
> > > > > > > > +}
> > > > > > > >  EXPORT_SYMBOL(ib_dispatch_event);
> > > > > > >
> > > > > > > Why not just move this into cache.c?
> > > > > > >
> > > > > > Same thought same to me when I had to add one liner call.
> > > > > > However the issue was device.c has the code for the event
> > > > > registration/unregistration and calling the handlers unrelated to cache.
> > > > > > So moving ib_dispatch_event() to cache.c looked incorrect to me.
> > > > >
> > > > > Well, maybe we can move the wq code from the cache.c into here?
> > > >
> > > > I prefer to keep all cache specific code in cache.c because there
> > > > is where we flush the ib_wq, queue work there.
> > >
> > > I think that is because the cache is not subscribing to a notifier,
> > > it really should, then things order properly and the flush is not needed.
> > >
> > Cache shouldn't subscribe to the notifier, that is the race condition
> > issue.
> 
> Notifiers are ordered, as long as the cache subscribes first to the notifier list it
> is not a 'race condition'.
> 
> This is still somewhat problematic because 'ib_dispatch_event' still calls the
> handlers under a spinlock (ie the handlers are still non-blocking contexts)
> 
It is not problematic  unless we ask ULPs to execute most of their work in the notifier context.
It is sub optimal at the moment.

> I'm suggesting to swap it for a normal blocking notifier chain and guarantee
Before writing this version, I considered that option.
However the blocker was qp's event_handler callback.
In below two call traces rw_semaphore will be triggered in interrupt context.
So wanted to split the QP's event_handler with device->event_handler.
After I split both, than running device's event_handler will be possible with cache as first entry in chain.
Shall I revise the series to split two handlers?

qedr_iw_event_handler
  qedr_iw_qp_event

hns_roce_v2_aeq_int
  hns_roce_qp_event
    qp->event_handler
       __ib_shared_qp_event_handler()

> the cache is the first entry in the chain.
> 
> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-10-29 19:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-29 11:53 [PATCH rdma-next v1 0/2] Let IB core distribute cache update events Leon Romanovsky
2019-10-29 11:53 ` [PATCH rdma-next v1 1/2] IB/core: " Leon Romanovsky
2019-10-29 13:31   ` Parav Pandit
2019-10-29 15:24   ` Jason Gunthorpe
2019-10-29 15:32     ` Parav Pandit
2019-10-29 15:33       ` Jason Gunthorpe
2019-10-29 15:47         ` Parav Pandit
2019-10-29 15:54           ` Jason Gunthorpe
2019-10-29 18:11             ` Parav Pandit
2019-10-29 18:26               ` Jason Gunthorpe
2019-10-29 19:28                 ` Parav Pandit
2019-10-29 11:53 ` [PATCH rdma-next v1 2/2] IB/core: Cut down single member ib_cache structure Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox