From: Jason Gunthorpe <jgg@ziepe.ca>
To: Jiri Pirko <jiri@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
syzbot <syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"syzkaller-bugs@googlegroups.com"
<syzkaller-bugs@googlegroups.com>
Subject: Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
Date: Mon, 13 Apr 2026 14:42:28 -0300 [thread overview]
Message-ID: <20260413174228.GQ3694781@ziepe.ca> (raw)
In-Reply-To: <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote:
> Will check it tmrw
I fed it to Claude and after 40 mins it is stumped too.. It should not
be possible for this to happen.
__ib_unregister_device() always calls down to disable_device()
Which always removes it from all visibility, drives the refcount to 0
and then cleans the xarray:
xa_for_each (&device->compat_devs, index, cdev)
remove_one_compat_dev(device, index);
Then ib_dealloc_device() checks it is empty:
WARN_ON(!xa_empty(&device->compat_devs));
At the point the xa_for_each is run there should be no cocurrent
threads that can see the device. The refcount is zero, it was removed
from the xarray. The add_one_compat_dev() is never called in an
condition that could see a stray device.
It should not be possible for the compat_devs of a 0 refcount
ib_device removed from the device's xarray to be mutated between those
two checks.
One notable thing about xarray is you can have a xa_for_each() iterate
over nothing and also have xa_empty() be false. Maybe that is
happening here, but I could not find any way that should happen.
I guess just keep watching this and see if it happens ever again. Add
some debugging to print out the xarray. Maybe the way we are using
xarray is unexpectedly triggering a stray 0 entry?
Jason
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4c174f7f1070cb..592e29b0cccf39 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1020,6 +1020,71 @@ static void remove_compat_devs(struct ib_device *device)
xa_for_each (&device->compat_devs, index, cdev)
remove_one_compat_dev(device, index);
+
+ if (!xa_empty(&device->compat_devs)) {
+ struct xa_node *node;
+ void *head;
+ unsigned int i;
+
+ dev_warn(&device->dev,
+ "compat_devs xarray not empty after removal!\n");
+
+ xa_lock(&device->compat_devs);
+ head = xa_head_locked(&device->compat_devs);
+ dev_warn(&device->dev, " xa_head=%px xa_flags=%x\n",
+ head, device->compat_devs.xa_flags);
+
+ if (!xa_is_node(head)) {
+ /* Single entry at index 0 stored directly in head */
+ if (xa_is_zero(head))
+ dev_warn(&device->dev,
+ " head[0]: zero entry (leaked xa_reserve)\n");
+ else if (!xa_is_internal(head))
+ dev_warn(&device->dev,
+ " head[0]: pointer %px\n", head);
+ else
+ dev_warn(&device->dev,
+ " head[0]: internal %px (%lu)\n",
+ head, xa_to_internal(head));
+ } else {
+ node = xa_to_node(head);
+ dev_warn(&device->dev,
+ " node %px shift %d count %d nr_values %d\n",
+ node, node->shift, node->count,
+ node->nr_values);
+ for (i = 0; i < XA_CHUNK_SIZE; i++) {
+ void *entry = xa_entry_locked(
+ &device->compat_devs, node, i);
+
+ if (!entry)
+ continue;
+ if (xa_is_zero(entry))
+ dev_warn(&device->dev,
+ " slot[%u]: zero entry (leaked xa_reserve)\n",
+ i);
+ else if (xa_is_sibling(entry))
+ dev_warn(&device->dev,
+ " slot[%u]: sibling -> slot %lu\n",
+ i, xa_to_sibling(entry));
+ else if (xa_is_retry(entry))
+ dev_warn(&device->dev,
+ " slot[%u]: retry\n", i);
+ else if (xa_is_node(entry))
+ dev_warn(&device->dev,
+ " slot[%u]: node %px (deeper tree)\n",
+ i, xa_to_node(entry));
+ else if (!xa_is_internal(entry))
+ dev_warn(&device->dev,
+ " slot[%u]: pointer %px\n",
+ i, entry);
+ else
+ dev_warn(&device->dev,
+ " slot[%u]: unknown internal %px\n",
+ i, entry);
+ }
+ }
+ xa_unlock(&device->compat_devs);
+ }
}
static int add_compat_devs(struct ib_device *device)
prev parent reply other threads:[~2026-04-13 17:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 0:04 [syzbot] [rdma?] WARNING in ib_dealloc_device syzbot
2026-04-13 15:43 ` Leon Romanovsky
[not found] ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
2026-04-13 17:42 ` Jason Gunthorpe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260413174228.GQ3694781@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=jiri@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox