public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Jiri Pirko <jiri@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	syzbot <syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"syzkaller-bugs@googlegroups.com"
	<syzkaller-bugs@googlegroups.com>
Subject: Re: [syzbot] [rdma?] WARNING in ib_dealloc_device
Date: Mon, 13 Apr 2026 14:42:28 -0300	[thread overview]
Message-ID: <20260413174228.GQ3694781@ziepe.ca> (raw)
In-Reply-To: <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>

On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote:
>    Will check it tmrw

I fed it to Claude and after 40 mins it is stumped too.. It should not
be possible for this to happen.

__ib_unregister_device() always calls down to disable_device()

Which always removes it from all visibility, drives the refcount to 0
and then cleans the xarray:

	xa_for_each (&device->compat_devs, index, cdev)
		remove_one_compat_dev(device, index);

Then ib_dealloc_device() checks it is empty:

	WARN_ON(!xa_empty(&device->compat_devs));

At the point the xa_for_each is run there should be no cocurrent
threads that can see the device. The refcount is zero, it was removed
from the xarray. The add_one_compat_dev() is never called in an
condition that could see a stray device.

It should not be possible for the compat_devs of a 0 refcount
ib_device removed from the device's xarray to be mutated between those
two checks.

One notable thing about xarray is you can have a xa_for_each() iterate
over nothing and also have xa_empty() be false. Maybe that is
happening here, but I could not find any way that should happen.

I guess just keep watching this and see if it happens ever again. Add
some debugging to print out the xarray. Maybe the way we are using
xarray is unexpectedly triggering a stray 0 entry?

Jason

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4c174f7f1070cb..592e29b0cccf39 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1020,6 +1020,71 @@ static void remove_compat_devs(struct ib_device *device)
 
 	xa_for_each (&device->compat_devs, index, cdev)
 		remove_one_compat_dev(device, index);
+
+	if (!xa_empty(&device->compat_devs)) {
+		struct xa_node *node;
+		void *head;
+		unsigned int i;
+
+		dev_warn(&device->dev,
+			 "compat_devs xarray not empty after removal!\n");
+
+		xa_lock(&device->compat_devs);
+		head = xa_head_locked(&device->compat_devs);
+		dev_warn(&device->dev, "  xa_head=%px xa_flags=%x\n",
+			 head, device->compat_devs.xa_flags);
+
+		if (!xa_is_node(head)) {
+			/* Single entry at index 0 stored directly in head */
+			if (xa_is_zero(head))
+				dev_warn(&device->dev,
+					 "  head[0]: zero entry (leaked xa_reserve)\n");
+			else if (!xa_is_internal(head))
+				dev_warn(&device->dev,
+					 "  head[0]: pointer %px\n", head);
+			else
+				dev_warn(&device->dev,
+					 "  head[0]: internal %px (%lu)\n",
+					 head, xa_to_internal(head));
+		} else {
+			node = xa_to_node(head);
+			dev_warn(&device->dev,
+				 "  node %px shift %d count %d nr_values %d\n",
+				 node, node->shift, node->count,
+				 node->nr_values);
+			for (i = 0; i < XA_CHUNK_SIZE; i++) {
+				void *entry = xa_entry_locked(
+					&device->compat_devs, node, i);
+
+				if (!entry)
+					continue;
+				if (xa_is_zero(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: zero entry (leaked xa_reserve)\n",
+						 i);
+				else if (xa_is_sibling(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: sibling -> slot %lu\n",
+						 i, xa_to_sibling(entry));
+				else if (xa_is_retry(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: retry\n", i);
+				else if (xa_is_node(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: node %px (deeper tree)\n",
+						 i, xa_to_node(entry));
+				else if (!xa_is_internal(entry))
+					dev_warn(&device->dev,
+						 "  slot[%u]: pointer %px\n",
+						 i, entry);
+				else
+					dev_warn(&device->dev,
+						 "  slot[%u]: unknown internal %px\n",
+						 i, entry);
+			}
+		}
+		xa_unlock(&device->compat_devs);
+	}
 }
 
 static int add_compat_devs(struct ib_device *device)

      parent reply	other threads:[~2026-04-13 17:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13  0:04 [syzbot] [rdma?] WARNING in ib_dealloc_device syzbot
2026-04-13 15:43 ` Leon Romanovsky
     [not found]   ` <PH7PR12MB66356E0176748BFFF081D9B4B0242@PH7PR12MB6635.namprd12.prod.outlook.com>
2026-04-13 17:42     ` Jason Gunthorpe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413174228.GQ3694781@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=jiri@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=syzbot+03393ff6c35fd2cc43de@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox