public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [rdma?] WARNING in ib_dealloc_device
@ 2026-04-13  0:04 syzbot
  2026-04-13 15:43 ` Leon Romanovsky
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2026-04-13  0:04 UTC (permalink / raw)
  To: jgg, leon, linux-kernel, linux-rdma, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    7f87a5ea75f0 Merge tag 'hid-for-linus-2026040801' of git:/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11778eba580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=45cb3c58fd963c27
dashboard link: https://syzkaller.appspot.com/bug?extid=03393ff6c35fd2cc43de
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0f5deca1373e/disk-7f87a5ea.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6aea6c1c6b6e/vmlinux-7f87a5ea.xz
kernel image: https://storage.googleapis.com/syzbot-assets/61444b289e96/bzImage-7f87a5ea.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com

------------[ cut here ]------------
!xa_empty(&device->compat_devs)
WARNING: drivers/infiniband/core/device.c:682 at ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682, CPU#0: kworker/u8:37/4856
Modules linked in:
CPU: 0 UID: 0 PID: 4856 Comm: kworker/u8:37 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)} 
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: ib-unreg-wq ib_unregister_work
RIP: 0010:ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682
Code: e8 de ec ad f9 48 89 df e8 56 59 07 00 48 81 c3 30 08 00 00 48 89 df 5b 41 5c 41 5e 41 5f 5d e9 0f 09 60 fd e8 ba ec ad f9 90 <0f> 0b 90 e9 72 ff ff ff e8 ac ec ad f9 90 0f 0b 90 eb 8f e8 a1 ec
RSP: 0018:ffffc9000f49fa18 EFLAGS: 00010293
RAX: ffffffff88169536 RBX: ffff888039d40000 RCX: ffff88806a691e80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff888039d41308 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: fffffbfff1ed4eb7 R12: 1ffff110073a81fd
R13: dffffc0000000000 R14: ffff888039d41268 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff888126332000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff6d2897e9c CR3: 0000000022382000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000f1ffffdf
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __ib_unregister_device+0x393/0x3f0 drivers/infiniband/core/device.c:1545
 ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1639
 process_one_work kernel/workqueue.c:3276 [inline]
 process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
  2026-04-13  0:04 [syzbot] [rdma?] WARNING in ib_dealloc_device syzbot
@ 2026-04-13 15:43 ` Leon Romanovsky
       [not found]   ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
  2026-04-14 15:57   ` Jiri Pirko
  0 siblings, 2 replies; 7+ messages in thread
From: Leon Romanovsky @ 2026-04-13 15:43 UTC (permalink / raw)
  To: syzbot; +Cc: jgg, linux-kernel, linux-rdma, syzkaller-bugs, Jiri Pirko

On Sun, Apr 12, 2026 at 05:04:32PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    7f87a5ea75f0 Merge tag 'hid-for-linus-2026040801' of git:/..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11778eba580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=45cb3c58fd963c27
> dashboard link: https://syzkaller.appspot.com/bug?extid=03393ff6c35fd2cc43de
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/0f5deca1373e/disk-7f87a5ea.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/6aea6c1c6b6e/vmlinux-7f87a5ea.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/61444b289e96/bzImage-7f87a5ea.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> !xa_empty(&device->compat_devs)
> WARNING: drivers/infiniband/core/device.c:682 at ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682, CPU#0: kworker/u8:37/4856

I think that we have only one patch in this area https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us

Thanks


> Modules linked in:
> CPU: 0 UID: 0 PID: 4856 Comm: kworker/u8:37 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)} 
> Tainted: [L]=SOFTLOCKUP
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
> Workqueue: ib-unreg-wq ib_unregister_work
> RIP: 0010:ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682
> Code: e8 de ec ad f9 48 89 df e8 56 59 07 00 48 81 c3 30 08 00 00 48 89 df 5b 41 5c 41 5e 41 5f 5d e9 0f 09 60 fd e8 ba ec ad f9 90 <0f> 0b 90 e9 72 ff ff ff e8 ac ec ad f9 90 0f 0b 90 eb 8f e8 a1 ec
> RSP: 0018:ffffc9000f49fa18 EFLAGS: 00010293
> RAX: ffffffff88169536 RBX: ffff888039d40000 RCX: ffff88806a691e80
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff888039d41308 R08: 0000000000000000 R09: 0000000000000000
> R10: dffffc0000000000 R11: fffffbfff1ed4eb7 R12: 1ffff110073a81fd
> R13: dffffc0000000000 R14: ffff888039d41268 R15: dffffc0000000000
> FS:  0000000000000000(0000) GS:ffff888126332000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff6d2897e9c CR3: 0000000022382000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000f1ffffdf
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __ib_unregister_device+0x393/0x3f0 drivers/infiniband/core/device.c:1545
>  ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1639
>  process_one_work kernel/workqueue.c:3276 [inline]
>  process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359
>  worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440
>  kthread+0x388/0x470 kernel/kthread.c:436
>  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
       [not found]   ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
@ 2026-04-13 17:42     ` Jason Gunthorpe
  2026-04-14 10:47       ` Leon Romanovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2026-04-13 17:42 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Leon Romanovsky, syzbot, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, syzkaller-bugs@googlegroups.com

On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote:
>    Will check it tmrw

I fed it to Claude and after 40 mins it is stumped too.. It should not
be possible for this to happen.

__ib_unregister_device() always calls down to disable_device()

Which always removes it from all visibility, drives the refcount to 0
and then cleans the xarray:

	xa_for_each (&device->compat_devs, index, cdev)
		remove_one_compat_dev(device, index);

Then ib_dealloc_device() checks it is empty:

	WARN_ON(!xa_empty(&device->compat_devs));

At the point the xa_for_each is run there should be no cocurrent
threads that can see the device. The refcount is zero, it was removed
from the xarray. The add_one_compat_dev() is never called in an
condition that could see a stray device.

It should not be possible for the compat_devs of a 0 refcount
ib_device removed from the device's xarray to be mutated between those
two checks.

One notable thing about xarray is you can have a xa_for_each() iterate
over nothing and also have xa_empty() be false. Maybe that is
happening here, but I could not find any way that should happen.

I guess just keep watching this and see if it happens ever again. Add
some debugging to print out the xarray. Maybe the way we are using
xarray is unexpectedly triggering a stray 0 entry?

Jason

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4c174f7f1070cb..592e29b0cccf39 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1020,6 +1020,71 @@ static void remove_compat_devs(struct ib_device *device)
 
 	xa_for_each (&device->compat_devs, index, cdev)
 		remove_one_compat_dev(device, index);
+
+	if (!xa_empty(&device->compat_devs)) {
+		struct xa_node *node;
+		void *head;
+		unsigned int i;
+
+		dev_warn(&device->dev,
+			 "compat_devs xarray not empty after removal!\n");
+
+		xa_lock(&device->compat_devs);
+		head = xa_head_locked(&device->compat_devs);
+		dev_warn(&device->dev, "  xa_head=%px xa_flags=%x\n",
+			 head, device->compat_devs.xa_flags);
+
+		if (!xa_is_node(head)) {
+			/* Single entry at index 0 stored directly in head */
+			if (xa_is_zero(head))
+				dev_warn(&device->dev,
+					 "  head[0]: zero entry (leaked xa_reserve)\n");
+			else if (!xa_is_internal(head))
+				dev_warn(&device->dev,
+					 "  head[0]: pointer %px\n", head);
+			else
+				dev_warn(&device->dev,
+					 "  head[0]: internal %px (%lu)\n",
+					 head, xa_to_internal(head));
+		} else {
+			node = xa_to_node(head);
+			dev_warn(&device->dev,
+				 "  node %px shift %d count %d nr_values %d\n",
+				 node, node->shift, node->count,
+				 node->nr_values);
+			for (i = 0; i < XA_CHUNK_SIZE; i++) {
+				void *entry = xa_entry_locked(
+					&device->compat_devs, node, i);
+
+				if (!entry)
+					continue;
+				if (xa_is_zero(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: zero entry (leaked xa_reserve)\n",
+						 i);
+				else if (xa_is_sibling(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: sibling -> slot %lu\n",
+						 i, xa_to_sibling(entry));
+				else if (xa_is_retry(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: retry\n", i);
+				else if (xa_is_node(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: node %px (deeper tree)\n",
+						 i, xa_to_node(entry));
+				else if (!xa_is_internal(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: pointer %px\n",
+						 i, entry);
+				else
+					dev_warn(&device->dev,
+						 "  slot[%u]: unknown internal %px\n",
+						 i, entry);
+			}
+		}
+		xa_unlock(&device->compat_devs);
+	}
 }
 
 static int add_compat_devs(struct ib_device *device)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
  2026-04-13 17:42     ` Jason Gunthorpe
@ 2026-04-14 10:47       ` Leon Romanovsky
  2026-04-14 12:18         ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2026-04-14 10:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jiri Pirko, syzbot, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, syzkaller-bugs@googlegroups.com

On Mon, Apr 13, 2026 at 02:42:28PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote:
> >    Will check it tmrw
> 
> I fed it to Claude and after 40 mins it is stumped too.. It should not
> be possible for this to happen.

Interesting, I used Chris's prompts for this debug and got the following
suggestions (CONFIG_PREEMPT_RT=y in this .config):

------------------------------------------------------------------------
REMAINING HYPOTHESES
------------------------------------------------------------------------

1. PREEMPT_RT rwsem behavior (most likely for syzkaller SOFTLOCKUP trigger):
   Under PREEMPT_RT, down_write/down_read use rt_mutex internally. Priority
   inheritance and preemption semantics differ from non-RT. There may be a
   window in the rwsem downgrade path inside enable_device_and_get (which
   downgrades from WRITE to READ after setting DEVICE_REGISTERED) that allows
   a concurrent disable_device to observe an inconsistent state.

   Specifically: enable_device_and_get does:
     down_write(devices_rwsem)
     xa_set_mark(DEVICE_REGISTERED)
     downgrade_write(devices_rwsem)  [WRITE -> READ]
     add_compat_devs()
     up_read(devices_rwsem)

   Under PREEMPT_RT, could disable_device acquire WRITE between the xa_set_mark
   and downgrade_write? If so, it would clear DEVICE_REGISTERED while
   add_compat_devs is about to run (but hasn't yet seen the mark cleared).

2. xa_for_each skipping entries during concurrent xa_erase restructuring:
   If rdma_dev_exit_net's remove_one_compat_dev erases an entry concurrently
   with remove_compat_devs iterating, xas_shrink (called inside xa_erase) could
   restructure the xarray tree. If xa_find_after then traverses a restructured
   tree and skips a subsequent entry, that entry remains in compat_devs.

   This is subtle: xa_erase takes the xarray spinlock (or rt_mutex), but
   xa_for_each calls xa_find_after under RCU. The RCU read side might see a
   partially-restructured tree that looks different from the spinlock-visible
   view. Under PREEMPT_RT, RCU critical sections can be longer.

3. rdma_compatdev_set (ib_devices_shared_netns sysctl) race:
   add_all_compat_devs() is guarded by DEVICE_REGISTERED + devices_rwsem, so
   the same analysis as T3a applies and the race is eliminated. However, if
   there is a remove_all_compat_devs() implementation, its interaction with
   the unregistration flow deserves verification.

------------------------------------------------------------------------
RECOMMENDED NEXT STEPS
------------------------------------------------------------------------

1. Add WARN_ON with stack trace inside add_one_compat_dev (compat_devs_mutex
   held) that fires if DEVICE_REGISTERED is not set. If this never fires, the
   insertion is always properly gated. If it fires, the unmarked insertion path
   is identified.

2. Add ftrace or kprobe on remove_compat_devs and add_one_compat_dev to
   capture the exact sequence of events. The key question: does any
   add_one_compat_dev call happen AFTER remove_compat_devs for the same device?

3. Check whether the bug exists on non-PREEMPT_RT kernels. If only PREEMPT_RT
   is affected, hypothesis 1 (rwsem downgrade race) or hypothesis 2 (RCU/xarray
   interaction) is more likely.

4. Look at the kernel version of the syzkaller report. Check git log for any
   changes to drivers/infiniband/core/device.c around the report date that may
   have introduced or fixed the issue.

5. Investigate enable_device_and_get's downgrade_write() path -- specifically
   whether a concurrent disable_device can observe DEVICE_REGISTERED set between
   xa_set_mark and the downgrade, then fail to clear it before add_compat_devs runs.

------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
  2026-04-14 10:47       ` Leon Romanovsky
@ 2026-04-14 12:18         ` Jason Gunthorpe
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2026-04-14 12:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jiri Pirko, syzbot, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, syzkaller-bugs@googlegroups.com

On Tue, Apr 14, 2026 at 01:47:01PM +0300, Leon Romanovsky wrote:
> On Mon, Apr 13, 2026 at 02:42:28PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote:
> > >    Will check it tmrw
> > 
> > I fed it to Claude and after 40 mins it is stumped too.. It should not
> > be possible for this to happen.
> 
> Interesting, I used Chris's prompts for this debug and got the following
> suggestions (CONFIG_PREEMPT_RT=y in this .config):
> 
> ------------------------------------------------------------------------
> REMAINING HYPOTHESES
> ------------------------------------------------------------------------
> 
> 1. PREEMPT_RT rwsem behavior (most likely for syzkaller SOFTLOCKUP trigger):
>    Under PREEMPT_RT, down_write/down_read use rt_mutex internally. Priority
>    inheritance and preemption semantics differ from non-RT. There may be a
>    window in the rwsem downgrade path inside enable_device_and_get (which
>    downgrades from WRITE to READ after setting DEVICE_REGISTERED) that allows
>    a concurrent disable_device to observe an inconsistent state.

Is this actually true? What is the point of implementing
downgrade_write like this?

>    Specifically: enable_device_and_get does:
>      down_write(devices_rwsem)
>      xa_set_mark(DEVICE_REGISTERED)
>      downgrade_write(devices_rwsem)  [WRITE -> READ]
>      add_compat_devs()
>      up_read(devices_rwsem)
> 
>    Under PREEMPT_RT, could disable_device acquire WRITE between the xa_set_mark
>    and downgrade_write? If so, it would clear DEVICE_REGISTERED while
>    add_compat_devs is about to run (but hasn't yet seen the mark cleared).

This is half a thought, okay, so even if they race, the entry to
remove_compat_devs() is sill gated by

	/* Pairs with refcount_set in enable_device */
	ib_device_put(device);
	wait_for_completion(&device->unreg_completion);

And we still have the refcount guarding it:

	refcount_set(&device->refcount, 2);
	down_write(&devices_rwsem);
	xa_set_mark(&devices, device->index, DEVICE_REGISTERED);

So we can't race add_compat_devs and remove_compat_devs() like this
unless there is some way for the refcount to have been dropped to zero
also. I don't think there is.

> 2. xa_for_each skipping entries during concurrent xa_erase restructuring:
>    If rdma_dev_exit_net's remove_one_compat_dev erases an entry concurrently
>    with remove_compat_devs iterating, xas_shrink (called inside xa_erase) could
>    restructure the xarray tree. If xa_find_after then traverses a restructured
>    tree and skips a subsequent entry, that entry remains in compat_devs.

This race is also impossible due to the mark and the refcount.

>    This is subtle: xa_erase takes the xarray spinlock (or rt_mutex), but
>    xa_for_each calls xa_find_after under RCU. The RCU read side might see a
>    partially-restructured tree that looks different from the spinlock-visible
>    view. Under PREEMPT_RT, RCU critical sections can be longer.
> 
> 3. rdma_compatdev_set (ib_devices_shared_netns sysctl) race:
>    add_all_compat_devs() is guarded by DEVICE_REGISTERED + devices_rwsem, so
>    the same analysis as T3a applies and the race is eliminated. However, if
>    there is a remove_all_compat_devs() implementation, its interaction with
>    the unregistration flow deserves verification.

Huh? your claude has lost its mind :)

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
  2026-04-13 15:43 ` Leon Romanovsky
       [not found]   ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
@ 2026-04-14 15:57   ` Jiri Pirko
  2026-04-16  8:10     ` Jiri Pirko
  1 sibling, 1 reply; 7+ messages in thread
From: Jiri Pirko @ 2026-04-14 15:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot, jgg, linux-kernel, linux-rdma, syzkaller-bugs, Jiri Pirko

Mon, Apr 13, 2026 at 05:43:53PM +0200, leon@kernel.org wrote:
>On Sun, Apr 12, 2026 at 05:04:32PM -0700, syzbot wrote:
>> Hello,
>> 
>> syzbot found the following issue on:
>> 
>> HEAD commit:    7f87a5ea75f0 Merge tag 'hid-for-linus-2026040801' of git:/..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=11778eba580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=45cb3c58fd963c27
>> dashboard link: https://syzkaller.appspot.com/bug?extid=03393ff6c35fd2cc43de
>> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>> 
>> Unfortunately, I don't have any reproducer for this issue yet.
>> 
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/0f5deca1373e/disk-7f87a5ea.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/6aea6c1c6b6e/vmlinux-7f87a5ea.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/61444b289e96/bzImage-7f87a5ea.xz
>> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com
>> 
>> ------------[ cut here ]------------
>> !xa_empty(&device->compat_devs)
>> WARNING: drivers/infiniband/core/device.c:682 at ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682, CPU#0: kworker/u8:37/4856
>
>I think that we have only one patch in this area https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us

Unable to find a link to this patch. But I don't see a scenario on which
this WARN can happen either. Very odd.


>
>Thanks
>
>
>> Modules linked in:
>> CPU: 0 UID: 0 PID: 4856 Comm: kworker/u8:37 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)} 
>> Tainted: [L]=SOFTLOCKUP
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
>> Workqueue: ib-unreg-wq ib_unregister_work
>> RIP: 0010:ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682
>> Code: e8 de ec ad f9 48 89 df e8 56 59 07 00 48 81 c3 30 08 00 00 48 89 df 5b 41 5c 41 5e 41 5f 5d e9 0f 09 60 fd e8 ba ec ad f9 90 <0f> 0b 90 e9 72 ff ff ff e8 ac ec ad f9 90 0f 0b 90 eb 8f e8 a1 ec
>> RSP: 0018:ffffc9000f49fa18 EFLAGS: 00010293
>> RAX: ffffffff88169536 RBX: ffff888039d40000 RCX: ffff88806a691e80
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: ffff888039d41308 R08: 0000000000000000 R09: 0000000000000000
>> R10: dffffc0000000000 R11: fffffbfff1ed4eb7 R12: 1ffff110073a81fd
>> R13: dffffc0000000000 R14: ffff888039d41268 R15: dffffc0000000000
>> FS:  0000000000000000(0000) GS:ffff888126332000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007ff6d2897e9c CR3: 0000000022382000 CR4: 00000000003526f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000f1ffffdf
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Call Trace:
>>  <TASK>
>>  __ib_unregister_device+0x393/0x3f0 drivers/infiniband/core/device.c:1545
>>  ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1639
>>  process_one_work kernel/workqueue.c:3276 [inline]
>>  process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359
>>  worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440
>>  kthread+0x388/0x470 kernel/kthread.c:436
>>  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
>>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>>  </TASK>
>> 
>> 
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>> 
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> 
>> If the report is already addressed, let syzbot know by replying with:
>> #syz fix: exact-commit-title
>> 
>> If you want to overwrite report's subsystems, reply with:
>> #syz set subsystems: new-subsystem
>> (See the list of subsystem names on the web dashboard)
>> 
>> If the report is a duplicate of another one, reply with:
>> #syz dup: exact-subject-of-another-report
>> 
>> If you want to undo deduplication, reply with:
>> #syz undup
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
  2026-04-14 15:57   ` Jiri Pirko
@ 2026-04-16  8:10     ` Jiri Pirko
  0 siblings, 0 replies; 7+ messages in thread
From: Jiri Pirko @ 2026-04-16  8:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot, jgg, linux-kernel, linux-rdma, syzkaller-bugs, Jiri Pirko

Tue, Apr 14, 2026 at 05:57:19PM +0200, jiri@resnulli.us wrote:
>Mon, Apr 13, 2026 at 05:43:53PM +0200, leon@kernel.org wrote:
>>On Sun, Apr 12, 2026 at 05:04:32PM -0700, syzbot wrote:
>>> Hello,
>>> 
>>> syzbot found the following issue on:
>>> 
>>> HEAD commit:    7f87a5ea75f0 Merge tag 'hid-for-linus-2026040801' of git:/..
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11778eba580000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=45cb3c58fd963c27
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=03393ff6c35fd2cc43de
>>> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>>> 
>>> Unfortunately, I don't have any reproducer for this issue yet.
>>> 
>>> Downloadable assets:
>>> disk image: https://storage.googleapis.com/syzbot-assets/0f5deca1373e/disk-7f87a5ea.raw.xz
>>> vmlinux: https://storage.googleapis.com/syzbot-assets/6aea6c1c6b6e/vmlinux-7f87a5ea.xz
>>> kernel image: https://storage.googleapis.com/syzbot-assets/61444b289e96/bzImage-7f87a5ea.xz
>>> 
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com
>>> 
>>> ------------[ cut here ]------------
>>> !xa_empty(&device->compat_devs)
>>> WARNING: drivers/infiniband/core/device.c:682 at ib_dealloc_device+0x187/0x200 drivers/infiniband/core/device.c:682, CPU#0: kworker/u8:37/4856
>>
>>I think that we have only one patch in this area https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us
>
>Unable to find a link to this patch. But I don't see a scenario on which
>this WARN can happen either. Very odd.

Was digging a bit more, still unable to find the issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-16  8:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13  0:04 [syzbot] [rdma?] WARNING in ib_dealloc_device syzbot
2026-04-13 15:43 ` Leon Romanovsky
     [not found]   ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
2026-04-13 17:42     ` Jason Gunthorpe
2026-04-14 10:47       ` Leon Romanovsky
2026-04-14 12:18         ` Jason Gunthorpe
2026-04-14 15:57   ` Jiri Pirko
2026-04-16  8:10     ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox