[PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
@ 2026-01-27  9:38 Jiri Pirko
  2026-01-27 11:14 ` Tetsuo Handa
  2026-02-24  8:59 ` Leon Romanovsky
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Pirko @ 2026-01-27  9:38 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, penguin-kernel, roman.gushchin, wangliang74,
	yanjun.zhu

From: Jiri Pirko <jiri@nvidia.com>

RoCE GID entries become stale when netdev properties change during the
IB device registration window. This is reproducible with a udev rule
that sets a MAC address when a VF netdev appears:

  ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
    RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"

After VF creation, show_gids displays GIDs derived from the original
random MAC rather than the configured one.

The root cause is a race between netdev event processing and device
registration:

  CPU 0 (driver)                    CPU 1 (udev/workqueue)
  ──────────────                    ──────────────────────
  ib_register_device()
    ib_cache_setup_one()
      gid_table_setup_one()
        _gid_table_setup_one()
          ← GID table allocated
        rdma_roce_rescan_device()
          ← GIDs populated with
            OLD MAC
                                    ip link set eth4 addr NEW_MAC
                                    NETDEV_CHANGEADDR queued
                                    netdevice_event_work_handler()
                                      ib_enum_all_roce_netdevs()
                                        ← Iterates DEVICE_REGISTERED
                                        ← Device NOT marked yet, SKIP!
    enable_device_and_get()
      xa_set_mark(DEVICE_REGISTERED)
          ← Too late, event was lost

The netdev event handler uses ib_enum_all_roce_netdevs() which only
iterates devices marked DEVICE_REGISTERED. However, this mark is set
late in the registration process, after the GID cache is already
populated. Events arriving in this window are silently dropped.

Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
set immediately after the GID table is allocated and initialized. Use
the new mark in ib_enum_all_roce_netdevs() function to iterate devices
instead of DEVICE_REGISTERED.

This is safe because:
- After _gid_table_setup_one(), all required structures exist (port_data,
  immutable, cache.gid)
- The GID table mutex serializes concurrent access between the initial
  rescan and event handlers
- Event handlers correctly update stale GIDs even when racing with rescan
- The mark is cleared in ib_cache_cleanup_one() before teardown

This also fixes similar races for IP address events (inetaddr_event,
inet6addr_event) which use the same enumeration path.

Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 drivers/infiniband/core/cache.c     | 13 +++++++++++
 drivers/infiniband/core/core_priv.h |  3 +++
 drivers/infiniband/core/device.c    | 34 ++++++++++++++++++++++++++++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 81cf3c902e81..e0bef9fa6dc0 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -927,6 +927,13 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
 	if (err)
 		return err;
 
+	/*
+	 * Mark the device as ready for GID cache updates. This allows netdev
+	 * event handlers to update the GID cache even before the device is
+	 * fully registered.
+	 */
+	ib_device_enable_gid_updates(ib_dev);
+
 	rdma_roce_rescan_device(ib_dev);
 
 	return err;
@@ -1638,6 +1645,12 @@ void ib_cache_release_one(struct ib_device *device)
 
 void ib_cache_cleanup_one(struct ib_device *device)
 {
+	/*
+	 * Clear the GID updates mark first to prevent event handlers from
+	 * accessing the device while it's being torn down.
+	 */
+	ib_device_disable_gid_updates(device);
+
 	/* The cleanup function waits for all in-progress workqueue
 	 * elements and cleans up the GID cache. This function should be
 	 * called after the device was removed from the devices list and
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 05102769a918..a2c36666e6fc 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -100,6 +100,9 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
 			      roce_netdev_callback cb,
 			      void *cookie);
 
+void ib_device_enable_gid_updates(struct ib_device *device);
+void ib_device_disable_gid_updates(struct ib_device *device);
+
 typedef int (*nldev_callback)(struct ib_device *device,
 			      struct sk_buff *skb,
 			      struct netlink_callback *cb,
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 1174ab7da629..87eaefd3794b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -93,6 +93,7 @@ static struct workqueue_struct *ib_unreg_wq;
 static DEFINE_XARRAY_FLAGS(devices, XA_FLAGS_ALLOC);
 static DECLARE_RWSEM(devices_rwsem);
 #define DEVICE_REGISTERED XA_MARK_1
+#define DEVICE_GID_UPDATES XA_MARK_2
 
 static u32 highest_client_id;
 #define CLIENT_REGISTERED XA_MARK_1
@@ -2441,11 +2442,42 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
 	unsigned long index;
 
 	down_read(&devices_rwsem);
-	xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED)
+	xa_for_each_marked(&devices, index, dev, DEVICE_GID_UPDATES)
 		ib_enum_roce_netdev(dev, filter, filter_cookie, cb, cookie);
 	up_read(&devices_rwsem);
 }
 
+/**
+ * ib_device_enable_gid_updates - Mark device as ready for GID cache updates
+ * @device: Device to mark
+ *
+ * Called after GID table is allocated and initialized. After this mark is set,
+ * netdevice event handlers can update the device's GID cache. This allows
+ * events that arrive during device registration to be processed, avoiding
+ * stale GID entries when netdev properties change during the device
+ * registration process.
+ */
+void ib_device_enable_gid_updates(struct ib_device *device)
+{
+	down_write(&devices_rwsem);
+	xa_set_mark(&devices, device->index, DEVICE_GID_UPDATES);
+	up_write(&devices_rwsem);
+}
+
+/**
+ * ib_device_disable_gid_updates - Clear the GID updates mark
+ * @device: Device to unmark
+ *
+ * Called before GID table cleanup to prevent event handlers from accessing
+ * the device while it's being torn down.
+ */
+void ib_device_disable_gid_updates(struct ib_device *device)
+{
+	down_write(&devices_rwsem);
+	xa_clear_mark(&devices, device->index, DEVICE_GID_UPDATES);
+	up_write(&devices_rwsem);
+}
+
 /*
  * ib_enum_all_devs - enumerate all ib_devices
  * @cb: Callback to call for each found ib_device
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27  9:38 [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration Jiri Pirko
@ 2026-01-27 11:14 ` Tetsuo Handa
  2026-01-27 11:58   ` Jiri Pirko
  2026-01-27 16:00   ` Jason Gunthorpe
  2026-02-24  8:59 ` Leon Romanovsky
  1 sibling, 2 replies; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-27 11:14 UTC (permalink / raw)
  To: Jiri Pirko, linux-rdma
  Cc: jgg, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, wangliang74, yanjun.zhu

On 2026/01/27 18:38, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> RoCE GID entries become stale when netdev properties change during the
> IB device registration window. This is reproducible with a udev rule
> that sets a MAC address when a VF netdev appears:
> 
>   ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
>     RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"
> 
> After VF creation, show_gids displays GIDs derived from the original
> random MAC rather than the configured one.
> 
> The root cause is a race between netdev event processing and device
> registration:
> 
>   CPU 0 (driver)                    CPU 1 (udev/workqueue)
>   ──────────────                    ──────────────────────
>   ib_register_device()
>     ib_cache_setup_one()
>       gid_table_setup_one()
>         _gid_table_setup_one()
>           ← GID table allocated
>         rdma_roce_rescan_device()
>           ← GIDs populated with
>             OLD MAC
>                                     ip link set eth4 addr NEW_MAC
>                                     NETDEV_CHANGEADDR queued
>                                     netdevice_event_work_handler()
>                                       ib_enum_all_roce_netdevs()
>                                         ← Iterates DEVICE_REGISTERED
>                                         ← Device NOT marked yet, SKIP!
>     enable_device_and_get()
>       xa_set_mark(DEVICE_REGISTERED)
>           ← Too late, event was lost
> 
> The netdev event handler uses ib_enum_all_roce_netdevs() which only
> iterates devices marked DEVICE_REGISTERED. However, this mark is set
> late in the registration process, after the GID cache is already
> populated. Events arriving in this window are silently dropped.
> 
> Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
> set immediately after the GID table is allocated and initialized. Use
> the new mark in ib_enum_all_roce_netdevs() function to iterate devices
> instead of DEVICE_REGISTERED.
> 
> This is safe because:
> - After _gid_table_setup_one(), all required structures exist (port_data,
>   immutable, cache.gid)
> - The GID table mutex serializes concurrent access between the initial
>   rescan and event handlers
> - Event handlers correctly update stale GIDs even when racing with rescan
> - The mark is cleared in ib_cache_cleanup_one() before teardown
> 
> This also fixes similar races for IP address events (inetaddr_event,
> inet6addr_event) which use the same enumeration path.
> 
> Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

I was thinking making the NETDEV_UNREGISTER event handler valid until
wait_for_completion(&device->unreg_completion) in disable_device() returns
( https://lkml.kernel.org/r/b4a09ad8-97cc-4fe1-b02a-6192248694a8@I-love.SAKURA.ne.jp ).

Since your patch includes what I was trying to address, you can add

Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84

lines.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 11:14 ` Tetsuo Handa
@ 2026-01-27 11:58   ` Jiri Pirko
  2026-01-27 16:00   ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Pirko @ 2026-01-27 11:58 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-rdma, jgg, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, wangliang74, yanjun.zhu

Tue, Jan 27, 2026 at 12:14:58PM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>On 2026/01/27 18:38, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> RoCE GID entries become stale when netdev properties change during the
>> IB device registration window. This is reproducible with a udev rule
>> that sets a MAC address when a VF netdev appears:
>> 
>>   ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
>>     RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"
>> 
>> After VF creation, show_gids displays GIDs derived from the original
>> random MAC rather than the configured one.
>> 
>> The root cause is a race between netdev event processing and device
>> registration:
>> 
>>   CPU 0 (driver)                    CPU 1 (udev/workqueue)
>>   ──────────────                    ──────────────────────
>>   ib_register_device()
>>     ib_cache_setup_one()
>>       gid_table_setup_one()
>>         _gid_table_setup_one()
>>           ← GID table allocated
>>         rdma_roce_rescan_device()
>>           ← GIDs populated with
>>             OLD MAC
>>                                     ip link set eth4 addr NEW_MAC
>>                                     NETDEV_CHANGEADDR queued
>>                                     netdevice_event_work_handler()
>>                                       ib_enum_all_roce_netdevs()
>>                                         ← Iterates DEVICE_REGISTERED
>>                                         ← Device NOT marked yet, SKIP!
>>     enable_device_and_get()
>>       xa_set_mark(DEVICE_REGISTERED)
>>           ← Too late, event was lost
>> 
>> The netdev event handler uses ib_enum_all_roce_netdevs() which only
>> iterates devices marked DEVICE_REGISTERED. However, this mark is set
>> late in the registration process, after the GID cache is already
>> populated. Events arriving in this window are silently dropped.
>> 
>> Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
>> set immediately after the GID table is allocated and initialized. Use
>> the new mark in ib_enum_all_roce_netdevs() function to iterate devices
>> instead of DEVICE_REGISTERED.
>> 
>> This is safe because:
>> - After _gid_table_setup_one(), all required structures exist (port_data,
>>   immutable, cache.gid)
>> - The GID table mutex serializes concurrent access between the initial
>>   rescan and event handlers
>> - Event handlers correctly update stale GIDs even when racing with rescan
>> - The mark is cleared in ib_cache_cleanup_one() before teardown
>> 
>> This also fixes similar races for IP address events (inetaddr_event,
>> inet6addr_event) which use the same enumeration path.
>> 
>> Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>
>I was thinking making the NETDEV_UNREGISTER event handler valid until
>wait_for_completion(&device->unreg_completion) in disable_device() returns
>( https://lkml.kernel.org/r/b4a09ad8-97cc-4fe1-b02a-6192248694a8@I-love.SAKURA.ne.jp ).
>
>Since your patch includes what I was trying to address, you can add
>
>Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
>Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84

Indeed, looks like this fixes the issue you pointed out as well.
Jason, should I re-post or can you take the tags above during apply?

Thanks!


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 11:14 ` Tetsuo Handa
  2026-01-27 11:58   ` Jiri Pirko
@ 2026-01-27 16:00   ` Jason Gunthorpe
  2026-01-27 21:17     ` Yanjun.Zhu
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2026-01-27 16:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, wangliang74,
	yanjun.zhu

On Tue, Jan 27, 2026 at 08:14:58PM +0900, Tetsuo Handa wrote:
> On 2026/01/27 18:38, Jiri Pirko wrote:
> > From: Jiri Pirko <jiri@nvidia.com>
> > 
> > RoCE GID entries become stale when netdev properties change during the
> > IB device registration window. This is reproducible with a udev rule
> > that sets a MAC address when a VF netdev appears:
> > 
> >   ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
> >     RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"
> > 
> > After VF creation, show_gids displays GIDs derived from the original
> > random MAC rather than the configured one.
> > 
> > The root cause is a race between netdev event processing and device
> > registration:
> > 
> >   CPU 0 (driver)                    CPU 1 (udev/workqueue)
> >   ──────────────                    ──────────────────────
> >   ib_register_device()
> >     ib_cache_setup_one()
> >       gid_table_setup_one()
> >         _gid_table_setup_one()
> >           ← GID table allocated
> >         rdma_roce_rescan_device()
> >           ← GIDs populated with
> >             OLD MAC
> >                                     ip link set eth4 addr NEW_MAC
> >                                     NETDEV_CHANGEADDR queued
> >                                     netdevice_event_work_handler()
> >                                       ib_enum_all_roce_netdevs()
> >                                         ← Iterates DEVICE_REGISTERED
> >                                         ← Device NOT marked yet, SKIP!
> >     enable_device_and_get()
> >       xa_set_mark(DEVICE_REGISTERED)
> >           ← Too late, event was lost
> > 
> > The netdev event handler uses ib_enum_all_roce_netdevs() which only
> > iterates devices marked DEVICE_REGISTERED. However, this mark is set
> > late in the registration process, after the GID cache is already
> > populated. Events arriving in this window are silently dropped.
> > 
> > Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
> > set immediately after the GID table is allocated and initialized. Use
> > the new mark in ib_enum_all_roce_netdevs() function to iterate devices
> > instead of DEVICE_REGISTERED.
> > 
> > This is safe because:
> > - After _gid_table_setup_one(), all required structures exist (port_data,
> >   immutable, cache.gid)
> > - The GID table mutex serializes concurrent access between the initial
> >   rescan and event handlers
> > - Event handlers correctly update stale GIDs even when racing with rescan
> > - The mark is cleared in ib_cache_cleanup_one() before teardown
> > 
> > This also fixes similar races for IP address events (inetaddr_event,
> > inet6addr_event) which use the same enumeration path.
> > 
> > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
> > Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> 
> I was thinking making the NETDEV_UNREGISTER event handler valid until
> wait_for_completion(&device->unreg_completion) in disable_device() returns
> ( https://lkml.kernel.org/r/b4a09ad8-97cc-4fe1-b02a-6192248694a8@I-love.SAKURA.ne.jp ).
> 
> Since your patch includes what I was trying to address, you can add
> 
> Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
> 
> lines.

Can we feed it to syzkaller please and see if it does actually clear
it's repo? That particular bug already has 5 patches claiming to fix
it.

It has become some kind of catch all of all kinds of refcounting errors

[  247.188486][ T6052] unregister_netdevice: waiting for vcan0 to become free. Usage count = 2

Does this actually change the refcounting around that could fix that?
Looked like no?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 16:00   ` Jason Gunthorpe
@ 2026-01-27 21:17     ` Yanjun.Zhu
  2026-01-27 22:16       ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Yanjun.Zhu @ 2026-01-27 21:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Tetsuo Handa
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, wangliang74


On 1/27/26 8:00 AM, Jason Gunthorpe wrote:
> On Tue, Jan 27, 2026 at 08:14:58PM +0900, Tetsuo Handa wrote:
>> On 2026/01/27 18:38, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@nvidia.com>
>>>
>>> RoCE GID entries become stale when netdev properties change during the
>>> IB device registration window. This is reproducible with a udev rule
>>> that sets a MAC address when a VF netdev appears:
>>>
>>>    ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
>>>      RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"
>>>
>>> After VF creation, show_gids displays GIDs derived from the original
>>> random MAC rather than the configured one.
>>>
>>> The root cause is a race between netdev event processing and device
>>> registration:
>>>
>>>    CPU 0 (driver)                    CPU 1 (udev/workqueue)
>>>    ──────────────                    ──────────────────────
>>>    ib_register_device()
>>>      ib_cache_setup_one()
>>>        gid_table_setup_one()
>>>          _gid_table_setup_one()
>>>            ← GID table allocated
>>>          rdma_roce_rescan_device()
>>>            ← GIDs populated with
>>>              OLD MAC
>>>                                      ip link set eth4 addr NEW_MAC
>>>                                      NETDEV_CHANGEADDR queued
>>>                                      netdevice_event_work_handler()
>>>                                        ib_enum_all_roce_netdevs()
>>>                                          ← Iterates DEVICE_REGISTERED
>>>                                          ← Device NOT marked yet, SKIP!
>>>      enable_device_and_get()
>>>        xa_set_mark(DEVICE_REGISTERED)
>>>            ← Too late, event was lost
>>>
>>> The netdev event handler uses ib_enum_all_roce_netdevs() which only
>>> iterates devices marked DEVICE_REGISTERED. However, this mark is set
>>> late in the registration process, after the GID cache is already
>>> populated. Events arriving in this window are silently dropped.
>>>
>>> Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is
>>> set immediately after the GID table is allocated and initialized. Use
>>> the new mark in ib_enum_all_roce_netdevs() function to iterate devices
>>> instead of DEVICE_REGISTERED.
>>>
>>> This is safe because:
>>> - After _gid_table_setup_one(), all required structures exist (port_data,
>>>    immutable, cache.gid)
>>> - The GID table mutex serializes concurrent access between the initial
>>>    rescan and event handlers
>>> - Event handlers correctly update stale GIDs even when racing with rescan
>>> - The mark is cleared in ib_cache_cleanup_one() before teardown
>>>
>>> This also fixes similar races for IP address events (inetaddr_event,
>>> inet6addr_event) which use the same enumeration path.
>>>
>>> Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
>>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> I was thinking making the NETDEV_UNREGISTER event handler valid until
>> wait_for_completion(&device->unreg_completion) in disable_device() returns
>> ( https://lkml.kernel.org/r/b4a09ad8-97cc-4fe1-b02a-6192248694a8@I-love.SAKURA.ne.jp ).
>>
>> Since your patch includes what I was trying to address, you can add
>>
>> Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
>>
>> lines.
> Can we feed it to syzkaller please and see if it does actually clear
> it's repo? That particular bug already has 5 patches claiming to fix
> it.


#syz test: repository_link branch

The above command will make syzkaller test your commit.

BTW, your commit should be the topmost commit.


Thanks,

Zhu Yanjun

>
> It has become some kind of catch all of all kinds of refcounting errors
>
> [  247.188486][ T6052] unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
>
> Does this actually change the refcounting around that could fix that?
> Looked like no?
>
> Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 21:17     ` Yanjun.Zhu
@ 2026-01-27 22:16       ` Tetsuo Handa
  2026-01-27 22:54         ` Yanjun.Zhu
  0 siblings, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-27 22:16 UTC (permalink / raw)
  To: Yanjun.Zhu, Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, wangliang74

On 2026/01/28 6:17, Yanjun.Zhu wrote:
>> Can we feed it to syzkaller please and see if it does actually clear
>> it's repo? That particular bug already has 5 patches claiming to fix
>> it.
> 
> 
> #syz test: repository_link branch
> 
> The above command will make syzkaller test your commit.
> 
> BTW, your commit should be the topmost commit.

No, we can't ask syzbot to test this patch.

All kinds of refcounting errors that appear as

  unregister_netdevice: waiting for $dev to become free. Usage count = $count

problem are reported there, but syzbot has not found a reproducer for
the bug this patch addresses.

I can test this patch using linux-next tree via my tree if you like.

> 
> 
> Thanks,
> 
> Zhu Yanjun
> 
>>
>> It has become some kind of catch all of all kinds of refcounting errors
>>
>> [  247.188486][ T6052] unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
>>
>> Does this actually change the refcounting around that could fix that?
>> Looked like no?
>>
>> Jason


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 22:16       ` Tetsuo Handa
@ 2026-01-27 22:54         ` Yanjun.Zhu
  2026-01-28  4:52           ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Yanjun.Zhu @ 2026-01-27 22:54 UTC (permalink / raw)
  To: Tetsuo Handa, Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, wangliang74


On 1/27/26 2:16 PM, Tetsuo Handa wrote:
> On 2026/01/28 6:17, Yanjun.Zhu wrote:
>>> Can we feed it to syzkaller please and see if it does actually clear
>>> it's repo? That particular bug already has 5 patches claiming to fix
>>> it.
>>
>> #syz test: repository_link branch
>>
>> The above command will make syzkaller test your commit.
>>
>> BTW, your commit should be the topmost commit.
> No, we can't ask syzbot to test this patch.
>
> All kinds of refcounting errors that appear as
>
>    unregister_netdevice: waiting for $dev to become free. Usage count = $count
>
> problem are reported there, but syzbot has not found a reproducer for
> the bug this patch addresses.
>
> I can test this patch using linux-next tree via my tree if you like.

Thanks. Currently, syzbot does not have a reproducer to reproduce and 
verify this issue.

However, you have a test environment to reproduce the problem, and you 
can verify this commit.

Is that correct?

If so, that would be very helpful.


Zhu Yanjun

>
>>
>> Thanks,
>>
>> Zhu Yanjun
>>
>>> It has become some kind of catch all of all kinds of refcounting errors
>>>
>>> [  247.188486][ T6052] unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
>>>
>>> Does this actually change the refcounting around that could fix that?
>>> Looked like no?
>>>
>>> Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27 22:54         ` Yanjun.Zhu
@ 2026-01-28  4:52           ` Tetsuo Handa
  2026-01-28  8:26             ` Tetsuo Handa
  2026-01-28 13:43             ` Jiri Pirko
  0 siblings, 2 replies; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-28  4:52 UTC (permalink / raw)
  To: Yanjun.Zhu, Jason Gunthorpe, Jiri Pirko
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin

On 2026/01/28 7:54, Yanjun.Zhu wrote:
> 
> On 1/27/26 2:16 PM, Tetsuo Handa wrote:
>> On 2026/01/28 6:17, Yanjun.Zhu wrote:
>>>> Can we feed it to syzkaller please and see if it does actually clear
>>>> it's repo? That particular bug already has 5 patches claiming to fix
>>>> it.
>>>
>>> #syz test: repository_link branch
>>>
>>> The above command will make syzkaller test your commit.
>>>
>>> BTW, your commit should be the topmost commit.
>> No, we can't ask syzbot to test this patch.
>>
>> All kinds of refcounting errors that appear as
>>
>>    unregister_netdevice: waiting for $dev to become free. Usage count = $count
>>
>> problem are reported there, but syzbot has not found a reproducer for
>> the bug this patch addresses.
>>
>> I can test this patch using linux-next tree via my tree if you like.
> 
> Thanks. Currently, syzbot does not have a reproducer to reproduce and verify this issue.

Correct. syzbot has a test environment but does not have a reproducer for this issue.

> 
> However, you have a test environment to reproduce the problem, and you can verify this commit.

Not correct. I have neither a test environment nor a reproducer.

What I have is a git tree that doesn't care about doing "git reset --hard" followed by
"git push -f". I'm currently carrying debug printk() patches that are intended to be
utilized by syzbot reports from linux-next tree, for CONFIG_NET_DEV_REFCNT_TRACKER is not
sufficient for debugging
"unregister_netdevice: waiting for $dev to become free. Usage count = $count" problems.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20260126&id=967b7e51aff00eae187d924c0097c0691d27421e

Before you make a permanent change in your git tree, we can try if syzbot still generates
reports like https://lkml.kernel.org/r/c1f9511c-7ad0-444d-aa0c-516674702b4e@I-love.SAKURA.ne.jp
with Jiri's patch applied. I guess it would take about 2 weeks before we can make a judgement.

Two things I worry about Jiri's patch are that

  refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
  DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()

and

  I'm not convinced that it is safe/meaningful to keep DEVICE_GID_UPDATES notifications valid
  between wait_for_completion(&device->unreg_completion) in disable_device() and the beginning
  of ib_cache_cleanup_one() because I don't know whether DEVICE_GID_UPDATES notifications can
  make sense after device->refcount became 0

.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-28  4:52           ` Tetsuo Handa
@ 2026-01-28  8:26             ` Tetsuo Handa
  2026-01-28 13:40               ` Jiri Pirko
  2026-01-28 13:43             ` Jiri Pirko
  1 sibling, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-28  8:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

On 2026/01/28 13:52, Tetsuo Handa wrote:
> Two things I worry about Jiri's patch are that
> 
>   refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
>   DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()

Well, since siw_netdev_event() is calling ib_device_put()
( https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/infiniband/sw/siw/siw_main.c#L403 ),
calling refcount_set() immediately before setting
xa_set_mark(&devices, device->index, DEVICE_REGISTERED) is no longer safe.

> 
> and
> 
>   I'm not convinced that it is safe/meaningful to keep DEVICE_GID_UPDATES notifications valid
>   between wait_for_completion(&device->unreg_completion) in disable_device() and the beginning
>   of ib_cache_cleanup_one() because I don't know whether DEVICE_GID_UPDATES notifications can
>   make sense after device->refcount became 0
> 
> .
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-28  8:26             ` Tetsuo Handa
@ 2026-01-28 13:40               ` Jiri Pirko
  2026-01-29  6:06                 ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Pirko @ 2026-01-28 13:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

Wed, Jan 28, 2026 at 09:26:52AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>On 2026/01/28 13:52, Tetsuo Handa wrote:
>> Two things I worry about Jiri's patch are that
>> 
>>   refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
>>   DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()
>
>Well, since siw_netdev_event() is calling ib_device_put()
>( https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/infiniband/sw/siw/siw_main.c#L403 ),
>calling refcount_set() immediately before setting
>xa_set_mark(&devices, device->index, DEVICE_REGISTERED) is no longer safe.

I may be missing something. How exactly is the notifier in siw_main
related to this patch? I don't see it depends on DEVICE_GID_UPDATES
mark.


>
>> 
>> and
>> 
>>   I'm not convinced that it is safe/meaningful to keep DEVICE_GID_UPDATES notifications valid
>>   between wait_for_completion(&device->unreg_completion) in disable_device() and the beginning
>>   of ib_cache_cleanup_one() because I don't know whether DEVICE_GID_UPDATES notifications can
>>   make sense after device->refcount became 0
>> 
>> .
>> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-28  4:52           ` Tetsuo Handa
  2026-01-28  8:26             ` Tetsuo Handa
@ 2026-01-28 13:43             ` Jiri Pirko
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Pirko @ 2026-01-28 13:43 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Yanjun.Zhu, Jason Gunthorpe, linux-rdma, leon, msanalla, maorg,
	parav, mbloch, markzhang, marco.crivellari, roman.gushchin

Wed, Jan 28, 2026 at 05:52:42AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:

[...]

>
>Two things I worry about Jiri's patch are that
>
>  refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
>  DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()

Perhaps I'm missing something, but where exactly in the notifications
related to this patch is someone calling ib_device_put()?


>
>and
>
>  I'm not convinced that it is safe/meaningful to keep DEVICE_GID_UPDATES notifications valid
>  between wait_for_completion(&device->unreg_completion) in disable_device() and the beginning
>  of ib_cache_cleanup_one() because I don't know whether DEVICE_GID_UPDATES notifications can
>  make sense after device->refcount became 0

Could you please clearly elaborate what seems to be a problem? I don't
see why notifications related to this patchset can't update gids for
devices with refcount 0. Do you?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-28 13:40               ` Jiri Pirko
@ 2026-01-29  6:06                 ` Tetsuo Handa
  2026-01-29 12:38                   ` Jiri Pirko
  0 siblings, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-29  6:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

On 2026/01/28 22:40, Jiri Pirko wrote:
> Wed, Jan 28, 2026 at 09:26:52AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>> On 2026/01/28 13:52, Tetsuo Handa wrote:
>>> Two things I worry about Jiri's patch are that
>>>
>>>   refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
>>>   DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()
>>
>> Well, since siw_netdev_event() is calling ib_device_put()
>> ( https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/infiniband/sw/siw/siw_main.c#L403 ),
>> calling refcount_set() immediately before setting
>> xa_set_mark(&devices, device->index, DEVICE_REGISTERED) is no longer safe.
> 
> I may be missing something. How exactly is the notifier in siw_main
> related to this patch? I don't see it depends on DEVICE_GID_UPDATES
> mark.

I don't have an accurate and complete list of notifier callback functions that are called by
DEVICE_GID_UPDATES notifications. But in general, I think that there is a pattern that an IB
notifier callback function tries to grab a reference to "struct ib_device" before doing
something, and releases that reference after doing something.

If you can prove that none of callback functions that are called by DEVICE_GID_UPDATES
notifications follows this pattern, it will be fine for now. But that might change in future.
Enabling callback before initializing refcount is a source of refcount imbalance troubles.
I expect your proposal to include

 struct ib_device {
 	(...snipped...)
+	bool device_was_fully_initialized;
 }

 static inline bool ib_device_try_get(struct ib_device *dev)
 {
+	BUG_ON(!device_was_fully_initialized);
 	return refcount_inc_not_zero(&dev->refcount);
 }

 void ib_device_put(struct ib_device *device)
 {
+	BUG_ON(!dev->device_was_fully_initialized);
 	if (refcount_dec_and_test(&device->refcount))
 		complete(&device->unreg_completion);
 }

 static int enable_device_and_get(struct ib_device *device)
 {
 	(...snipped...)
 	refcount_set(&device->refcount, 2);
+	device->device_was_fully_initialized = true;
 	down_write(&devices_rwsem);
 	xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
 	(...snipped...)
 }

for proving that there is no race.

Also, I don't know whether we even need to introduce DEVICE_GID_UPDATES.
Your patch description sounds to me that calling rdma_roce_rescan_device()
generates GIDs with current MAC. Then, why not call rdma_roce_rescan_device()
once again after xa_set_mark(&devices, device->index, DEVICE_REGISTERED) ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-29  6:06                 ` Tetsuo Handa
@ 2026-01-29 12:38                   ` Jiri Pirko
  2026-01-29 13:48                     ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Pirko @ 2026-01-29 12:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

Thu, Jan 29, 2026 at 07:06:31AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>On 2026/01/28 22:40, Jiri Pirko wrote:
>> Wed, Jan 28, 2026 at 09:26:52AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>>> On 2026/01/28 13:52, Tetsuo Handa wrote:
>>>> Two things I worry about Jiri's patch are that
>>>>
>>>>   refcount_set(&device->refcount, 2) in enable_device_and_get() becomes unsafe if 
>>>>   DEVICE_GID_UPDATES notifications can let someone to call ib_device_put()
>>>
>>> Well, since siw_netdev_event() is calling ib_device_put()
>>> ( https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/infiniband/sw/siw/siw_main.c#L403 ),
>>> calling refcount_set() immediately before setting
>>> xa_set_mark(&devices, device->index, DEVICE_REGISTERED) is no longer safe.
>> 
>> I may be missing something. How exactly is the notifier in siw_main
>> related to this patch? I don't see it depends on DEVICE_GID_UPDATES
>> mark.
>
>I don't have an accurate and complete list of notifier callback functions that are called by
>DEVICE_GID_UPDATES notifications. But in general, I think that there is a pattern that an IB
>notifier callback function tries to grab a reference to "struct ib_device" before doing
>something, and releases that reference after doing something.
>
>If you can prove that none of callback functions that are called by DEVICE_GID_UPDATES
>notifications follows this pattern, it will be fine for now. But that might change in future.
>Enabling callback before initializing refcount is a source of refcount imbalance troubles.
>I expect your proposal to include
>
> struct ib_device {
> 	(...snipped...)
>+	bool device_was_fully_initialized;
> }
>
> static inline bool ib_device_try_get(struct ib_device *dev)
> {
>+	BUG_ON(!device_was_fully_initialized);
> 	return refcount_inc_not_zero(&dev->refcount);
> }
>
> void ib_device_put(struct ib_device *device)
> {
>+	BUG_ON(!dev->device_was_fully_initialized);
> 	if (refcount_dec_and_test(&device->refcount))
> 		complete(&device->unreg_completion);
> }
>
> static int enable_device_and_get(struct ib_device *device)
> {
> 	(...snipped...)
> 	refcount_set(&device->refcount, 2);
>+	device->device_was_fully_initialized = true;
> 	down_write(&devices_rwsem);
> 	xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
> 	(...snipped...)
> }

Since the notifiers I'm trying to fix don't use
ib_device_try_get/ib_device_put, I don't see how your proposal is
related to my fix. Feel free to send it as a separate patch if you like.


>
>for proving that there is no race.
>
>Also, I don't know whether we even need to introduce DEVICE_GID_UPDATES.
>Your patch description sounds to me that calling rdma_roce_rescan_device()
>generates GIDs with current MAC. Then, why not call rdma_roce_rescan_device()
>once again after xa_set_mark(&devices, device->index, DEVICE_REGISTERED) ?

Investigate the rdma_roce_rescan_device() code properly. Unlike what the
name might suggest, this is only initialization. It does not count with
having existing entries in. I think that processing events properly when
they come is a superior solution.

If I'm not mistaken, you didn't pointed out any real issue with my patch,
only some guesses. So do you see any real issue?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-29 12:38                   ` Jiri Pirko
@ 2026-01-29 13:48                     ` Tetsuo Handa
  2026-01-29 14:58                       ` Jiri Pirko
  0 siblings, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-01-29 13:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

On 2026/01/29 21:38, Jiri Pirko wrote:
> Since the notifiers I'm trying to fix don't use
> ib_device_try_get/ib_device_put, I don't see how your proposal is
> related to my fix. Feel free to send it as a separate patch if you like.

I wanted to know an example of the notifiers you are trying to fix, for
that might help understanding how "the ib device REGISTERED and netdev event racing"
results in "unregister_netdevice: waiting for bond1 to become free. Usage count = 2" problem
at https://lkml.kernel.org/r/SJ0PR12MB6806E77B849859B7BAC8CC1CDC89A@SJ0PR12MB6806.namprd12.prod.outlook.com .

> If I'm not mistaken, you didn't pointed out any real issue with my patch,
> only some guesses.

Right.

>                    So do you see any real issue?

Not yet. I'm not yet familiar with rdma code.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-29 13:48                     ` Tetsuo Handa
@ 2026-01-29 14:58                       ` Jiri Pirko
  2026-02-02 14:20                         ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Pirko @ 2026-01-29 14:58 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

Thu, Jan 29, 2026 at 02:48:14PM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>On 2026/01/29 21:38, Jiri Pirko wrote:
>> Since the notifiers I'm trying to fix don't use
>> ib_device_try_get/ib_device_put, I don't see how your proposal is
>> related to my fix. Feel free to send it as a separate patch if you like.
>
>I wanted to know an example of the notifiers you are trying to fix, for
>that might help understanding how "the ib device REGISTERED and netdev event racing"
>results in "unregister_netdevice: waiting for bond1 to become free. Usage count = 2" problem
>at https://lkml.kernel.org/r/SJ0PR12MB6806E77B849859B7BAC8CC1CDC89A@SJ0PR12MB6806.namprd12.prod.outlook.com .

I suggest you read the patch description again.
for example this funtion netdevice_event_work_handler()
which eventually gets to call ib_enum_all_roce_netdevs().
Basically roce_gid_mgmt.c event handlers for netdev/ipv4/ipv5 events.


>
>
>
>> If I'm not mistaken, you didn't pointed out any real issue with my patch,
>> only some guesses.
>
>Right.
>
>>                    So do you see any real issue?
>
>Not yet. I'm not yet familiar with rdma code.

Hmm.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-29 14:58                       ` Jiri Pirko
@ 2026-02-02 14:20                         ` Tetsuo Handa
  2026-02-02 23:51                           ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-02-02 14:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: linux-rdma, leon, msanalla, maorg, parav, mbloch, markzhang,
	marco.crivellari, roman.gushchin, Yanjun.Zhu, Jason Gunthorpe

On 2026/01/29 23:58, Jiri Pirko wrote:
> I suggest you read the patch description again.
> for example this funtion netdevice_event_work_handler()
> which eventually gets to call ib_enum_all_roce_netdevs().

I checked possibility of race.

On 2026/01/27 18:38, Jiri Pirko wrote:
> - The GID table mutex serializes concurrent access between the initial
>   rescan and event handlers

Is this mutex_lock(&table->lock) in __ib_cache_gid_add() ?
https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/infiniband/core/cache.c#L564

> - Event handlers correctly update stale GIDs even when racing with rescan

I couldn't confirm that this is always true. What happens if rdma_roce_rescan_device()
is preempted between make_default_gid() and __ib_cache_gid_add(), and NETDEV_CHANGEADDR
event runs meanwhile? It sounds to me that stale gid is possible because gid value is
calculated before holding the GID table mutex...

rdma_roce_rescan_device() {
  ib_enum_roce_netdev() {
    enum_all_gids_of_dev_cb() {
      add_default_gids() {
        ib_cache_gid_set_default_gid() {
          make_default_gid(ndev, &gid); // GIDs populated with OLD MAC
                                                                ip link set eth4 addr NEW_MAC
                                                                NETDEV_CHANGEADDR queued
                                                                netdevice_event_work_handler() {
                                                                  ib_enum_all_roce_netdevs() {
                                                                    ib_enum_roce_netdev() {
                                                                      enum_all_gids_of_dev_cb() {
                                                                        add_default_gids() {
                                                                          ib_cache_gid_set_default_gid() {
                                                                            make_default_gid(ndev, &gid); // GIDs populated with NEW MAC
                                                                            __ib_cache_gid_add(ib_dev, port, &gid, &gid_attr, mask, true) {
                                                                              mutex_lock(&table->lock);
                                                                              add_modify_gid(table, attr); // Writing GIDs populated with NEW MAC
                                                                              mutex_unlock(&table->lock);
                                                                            }
                                                                          }
                                                                        }
                                                                      }
                                                                    }
                                                                  }
                                                                }
          __ib_cache_gid_add(ib_dev, port, &gid, &gid_attr, mask, true) {
            mutex_lock(&table->lock);
            add_modify_gid(table, attr); // Overwriting with GIDs populated with OLD MAC ?
          }
        }
      }
    }
  }
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-02 14:20                         ` Tetsuo Handa
@ 2026-02-02 23:51                           ` Jason Gunthorpe
  2026-02-03  3:52                             ` Tetsuo Handa
  2026-02-03  9:07                             ` Jiri Pirko
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2026-02-02 23:51 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

On Mon, Feb 02, 2026 at 11:20:22PM +0900, Tetsuo Handa wrote:

> > - Event handlers correctly update stale GIDs even when racing with rescan
> 
> I couldn't confirm that this is always true. What happens if rdma_roce_rescan_device()
> is preempted between make_default_gid() and __ib_cache_gid_add(), and NETDEV_CHANGEADDR
> event runs meanwhile? It sounds to me that stale gid is possible because gid value is
> calculated before holding the GID table mutex...
> 
> rdma_roce_rescan_device() {
>   ib_enum_roce_netdev() {
>     enum_all_gids_of_dev_cb() {
>       add_default_gids() {
>         ib_cache_gid_set_default_gid() {
>           make_default_gid(ndev, &gid); // GIDs populated with OLD MAC
>                                                                 ip link set eth4 addr NEW_MAC
>                                                                 NETDEV_CHANGEADDR queued

I thought this was impossible because enum_all_gids_of_dev_cb() holds
the rtnl_lock()?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-02 23:51                           ` Jason Gunthorpe
@ 2026-02-03  3:52                             ` Tetsuo Handa
  2026-02-10  8:58                               ` Jiri Pirko
  2026-02-03  9:07                             ` Jiri Pirko
  1 sibling, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-02-03  3:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

On 2026/02/03 8:51, Jason Gunthorpe wrote:
> On Mon, Feb 02, 2026 at 11:20:22PM +0900, Tetsuo Handa wrote:
> 
>>> - Event handlers correctly update stale GIDs even when racing with rescan
>>
>> I couldn't confirm that this is always true. What happens if rdma_roce_rescan_device()
>> is preempted between make_default_gid() and __ib_cache_gid_add(), and NETDEV_CHANGEADDR
>> event runs meanwhile? It sounds to me that stale gid is possible because gid value is
>> calculated before holding the GID table mutex...
>>
>> rdma_roce_rescan_device() {
>>   ib_enum_roce_netdev() {
>>     enum_all_gids_of_dev_cb() {
>>       add_default_gids() {
>>         ib_cache_gid_set_default_gid() {
>>           make_default_gid(ndev, &gid); // GIDs populated with OLD MAC
>>                                                                 ip link set eth4 addr NEW_MAC
>>                                                                 NETDEV_CHANGEADDR queued
> 
> I thought this was impossible because enum_all_gids_of_dev_cb() holds
> the rtnl_lock()?

Indeed, so it is not the GID table mutex but the RTNL mutex that serializes
concurrent access between the initial rescan and event handlers.

Jiri, can you append the race with unregister case to the patch description?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-02 23:51                           ` Jason Gunthorpe
  2026-02-03  3:52                             ` Tetsuo Handa
@ 2026-02-03  9:07                             ` Jiri Pirko
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Pirko @ 2026-02-03  9:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tetsuo Handa, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

Tue, Feb 03, 2026 at 12:51:33AM +0100, jgg@ziepe.ca wrote:
>On Mon, Feb 02, 2026 at 11:20:22PM +0900, Tetsuo Handa wrote:
>
>> > - Event handlers correctly update stale GIDs even when racing with rescan
>> 
>> I couldn't confirm that this is always true. What happens if rdma_roce_rescan_device()
>> is preempted between make_default_gid() and __ib_cache_gid_add(), and NETDEV_CHANGEADDR
>> event runs meanwhile? It sounds to me that stale gid is possible because gid value is
>> calculated before holding the GID table mutex...
>> 
>> rdma_roce_rescan_device() {
>>   ib_enum_roce_netdev() {
>>     enum_all_gids_of_dev_cb() {
>>       add_default_gids() {
>>         ib_cache_gid_set_default_gid() {
>>           make_default_gid(ndev, &gid); // GIDs populated with OLD MAC
>>                                                                 ip link set eth4 addr NEW_MAC
>>                                                                 NETDEV_CHANGEADDR queued
>
>I thought this was impossible because enum_all_gids_of_dev_cb() holds
>the rtnl_lock()?

Yep.

>
>Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-03  3:52                             ` Tetsuo Handa
@ 2026-02-10  8:58                               ` Jiri Pirko
  2026-02-10 10:22                                 ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Pirko @ 2026-02-10  8:58 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jason Gunthorpe, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

Tue, Feb 03, 2026 at 04:52:10AM +0100, penguin-kernel@I-love.SAKURA.ne.jp wrote:
>On 2026/02/03 8:51, Jason Gunthorpe wrote:
>> On Mon, Feb 02, 2026 at 11:20:22PM +0900, Tetsuo Handa wrote:
>> 
>>>> - Event handlers correctly update stale GIDs even when racing with rescan
>>>
>>> I couldn't confirm that this is always true. What happens if rdma_roce_rescan_device()
>>> is preempted between make_default_gid() and __ib_cache_gid_add(), and NETDEV_CHANGEADDR
>>> event runs meanwhile? It sounds to me that stale gid is possible because gid value is
>>> calculated before holding the GID table mutex...
>>>
>>> rdma_roce_rescan_device() {
>>>   ib_enum_roce_netdev() {
>>>     enum_all_gids_of_dev_cb() {
>>>       add_default_gids() {
>>>         ib_cache_gid_set_default_gid() {
>>>           make_default_gid(ndev, &gid); // GIDs populated with OLD MAC
>>>                                                                 ip link set eth4 addr NEW_MAC
>>>                                                                 NETDEV_CHANGEADDR queued
>> 
>> I thought this was impossible because enum_all_gids_of_dev_cb() holds
>> the rtnl_lock()?
>
>Indeed, so it is not the GID table mutex but the RTNL mutex that serializes
>concurrent access between the initial rescan and event handlers.
>
>Jiri, can you append the race with unregister case to the patch description?

I don't understand what exactly you want and why.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-10  8:58                               ` Jiri Pirko
@ 2026-02-10 10:22                                 ` Tetsuo Handa
  2026-02-10 15:42                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Tetsuo Handa @ 2026-02-10 10:22 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jason Gunthorpe, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

On 2026/02/10 17:58, Jiri Pirko wrote:
>> Jiri, can you append the race with unregister case to the patch description?
> 
> I don't understand what exactly you want and why.

I'm currently testing
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?h=next-20260209&id=755d028a40bfbef34f98286b394ab4d30d07e07b
in linux-next tree. Since your patch includes what the above patch does,
your patch description can include the following chunk; though I haven't
identified what is preventing the driver from reaching ib_cache_cleanup_one()
after NETDEV_UNREGISTER became no longer working...



This patch is expected to also address another race syzbot is reporting.

The root cause is a race between netdev event processing and device
unregistration; something is preventing the driver from reaching
ib_cache_cleanup_one() after NETDEV_UNREGISTER became no longer working.
 

  CPU 0 (driver)                    CPU 1 (workqueue)
  ──────────────      ──────────────────────
  __ib_unregister_device()
    disable_device()
      xa_clear_mark(DEVICE_REGISTERED)
        ← Too early, event will be lost

                                    NETDEV_UNREGISTER queued
                                    netdevice_event_work_handler()
                                      ib_enum_all_roce_netdevs()
                                        ← Iterates DEVICE_REGISTERED
                                        ← Device NO LONGER marked, SKIP!
                                        del_gid()
                                          ← GID cannot be deleted
    ib_cache_cleanup_one()
      gid_table_cleanup_one()
        gid_table_cleanup_one()
          del_gid()
            ← all GIDs will be deleted


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-10 10:22                                 ` Tetsuo Handa
@ 2026-02-10 15:42                                   ` Jason Gunthorpe
  2026-02-11  1:30                                     ` Tetsuo Handa
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2026-02-10 15:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

On Tue, Feb 10, 2026 at 07:22:43PM +0900, Tetsuo Handa wrote:

> The root cause is a race between netdev event processing and device
> unregistration; something is preventing the driver from reaching
> ib_cache_cleanup_one() after NETDEV_UNREGISTER became no longer working.
>  
> 
>   CPU 0 (driver)                    CPU 1 (workqueue)
>   ──────────────      ──────────────────────
>   __ib_unregister_device()
>     disable_device()
>       xa_clear_mark(DEVICE_REGISTERED)
>         ← Too early, event will be lost
> 
>                                     NETDEV_UNREGISTER queued
>                                     netdevice_event_work_handler()
>                                       ib_enum_all_roce_netdevs()
>                                         ← Iterates DEVICE_REGISTERED
>                                         ← Device NO LONGER marked, SKIP!
>                                         del_gid()
>                                           ← GID cannot be deleted
>     ib_cache_cleanup_one()
>       gid_table_cleanup_one()
>         gid_table_cleanup_one()
>           del_gid()
>             ← all GIDs will be deleted

This seems like a different thing, why does it matter if an unregister
event is lost like this?

gid_table_cleanup_one() removes all gids regardless of the state of
the netdev? It isn't going to start leaking a netdev reference if you
hit this race?

It isn't going to block unregister from progressing AFAICT.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-02-10 15:42                                   ` Jason Gunthorpe
@ 2026-02-11  1:30                                     ` Tetsuo Handa
  0 siblings, 0 replies; 24+ messages in thread
From: Tetsuo Handa @ 2026-02-11  1:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jiri Pirko, linux-rdma, leon, msanalla, maorg, parav, mbloch,
	markzhang, marco.crivellari, roman.gushchin, Yanjun.Zhu

On 2026/02/11 0:42, Jason Gunthorpe wrote:
> This seems like a different thing, why does it matter if an unregister
> event is lost like this?
> 
> gid_table_cleanup_one() removes all gids regardless of the state of
> the netdev? It isn't going to start leaking a netdev reference if you
> hit this race?

Yes, syzbot is reporting

  unregister_netdevice: waiting for bond1 to become free. Usage count = 2

problem when we hit this unregister race and gid_table_cleanup_one() cannot
be called within 10 seconds after xa_clear_mark(DEVICE_REGISTERED) is called
( https://lkml.kernel.org/r/c1f9511c-7ad0-444d-aa0c-516674702b4e@I-love.SAKURA.ne.jp ).

> 
> It isn't going to block unregister from progressing AFAICT.

Parav Pandit suspects that this race is relevant but you don't think
this race is relevant.

I'm currently monitoring linux-next tree whether there is a refcount leak,
by watching for khungtaskd warning at wait_for_completion() from
disable_device() from __ib_unregister_device(). If syzbot starts reporting
khungtaskd warning instead of

  unregister_netdevice: waiting for bond1 to become free. Usage count = 2

warning in linux-next tree, we can conclude that this unregister race is
a different thing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
  2026-01-27  9:38 [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration Jiri Pirko
  2026-01-27 11:14 ` Tetsuo Handa
@ 2026-02-24  8:59 ` Leon Romanovsky
  1 sibling, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2026-02-24  8:59 UTC (permalink / raw)
  To: linux-rdma, Jiri Pirko
  Cc: jgg, msanalla, maorg, parav, mbloch, markzhang, marco.crivellari,
	penguin-kernel, roman.gushchin, wangliang74, yanjun.zhu


On Tue, 27 Jan 2026 10:38:39 +0100, Jiri Pirko wrote:
> RoCE GID entries become stale when netdev properties change during the
> IB device registration window. This is reproducible with a udev rule
> that sets a MAC address when a VF netdev appears:
> 
>   ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \
>     RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66"
> 
> [...]

Applied, thanks!

[1/1] RDMA/core: Fix stale RoCE GIDs during netdev events at registration
      https://git.kernel.org/rdma/rdma/c/9af0feae8016ba

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-02-24  8:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27  9:38 [PATCH] RDMA/core: Fix stale RoCE GIDs during netdev events at registration Jiri Pirko
2026-01-27 11:14 ` Tetsuo Handa
2026-01-27 11:58   ` Jiri Pirko
2026-01-27 16:00   ` Jason Gunthorpe
2026-01-27 21:17     ` Yanjun.Zhu
2026-01-27 22:16       ` Tetsuo Handa
2026-01-27 22:54         ` Yanjun.Zhu
2026-01-28  4:52           ` Tetsuo Handa
2026-01-28  8:26             ` Tetsuo Handa
2026-01-28 13:40               ` Jiri Pirko
2026-01-29  6:06                 ` Tetsuo Handa
2026-01-29 12:38                   ` Jiri Pirko
2026-01-29 13:48                     ` Tetsuo Handa
2026-01-29 14:58                       ` Jiri Pirko
2026-02-02 14:20                         ` Tetsuo Handa
2026-02-02 23:51                           ` Jason Gunthorpe
2026-02-03  3:52                             ` Tetsuo Handa
2026-02-10  8:58                               ` Jiri Pirko
2026-02-10 10:22                                 ` Tetsuo Handa
2026-02-10 15:42                                   ` Jason Gunthorpe
2026-02-11  1:30                                     ` Tetsuo Handa
2026-02-03  9:07                             ` Jiri Pirko
2026-01-28 13:43             ` Jiri Pirko
2026-02-24  8:59 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox