From: Jason Gunthorpe <jgg@ziepe.ca>
To: Leon Romanovsky <leon@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>,
linux-rdma@vger.kernel.org,
syzbot+b0da83a6c0e2e2bddbd4@syzkaller.appspotmail.com
Subject: Re: [PATCH rdma-next v2 1/1] RDMA/core: Fix WARNING in gid_table_release_one
Date: Wed, 5 Nov 2025 09:45:24 -0400 [thread overview]
Message-ID: <20251105134524.GL1204670@ziepe.ca> (raw)
In-Reply-To: <20251105130958.GE16832@unreal>
On Wed, Nov 05, 2025 at 03:09:58PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 04, 2025 at 03:36:01PM -0800, Zhu Yanjun wrote:
> > GID entry ref leak for dev syz1 index 2 ref=615
> > ...
> > Call Trace:
> > <TASK>
> > ib_device_release+0xd2/0x1c0 drivers/infiniband/core/device.c:509
> > device_release+0x99/0x1c0 drivers/base/core.c:-1
> > kobject_cleanup lib/kobject.c:689 [inline]
> > kobject_release lib/kobject.c:720 [inline]
> > kref_put include/linux/kref.h:65 [inline]
> > kobject_put+0x228/0x480 lib/kobject.c:737
> > process_one_work kernel/workqueue.c:3263 [inline]
> > process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3346
> > worker_thread+0x8a0/0xda0 kernel/workqueue.c:3427
> > kthread+0x711/0x8a0 kernel/kthread.c:463
> > ret_from_fork+0x47c/0x820 arch/x86/kernel/process.c:158
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
> >
> > When the state of a GID is GID_TABLE_ENTRY_PENDING_DEL, it indicates
> > that the GID is about to be released soon. Therefore, it does not
> > appear to be a leak.
> >
> > Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts")
> > Reported-by: syzbot+b0da83a6c0e2e2bddbd4@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=b0da83a6c0e2e2bddbd4
> > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> > ---
> > V1->V2: Use flush_workqueue instead of while loop
> > ---
> > drivers/infiniband/core/cache.c | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> > index 81cf3c902e81..74211fb37020 100644
> > --- a/drivers/infiniband/core/cache.c
> > +++ b/drivers/infiniband/core/cache.c
> > @@ -799,16 +799,26 @@ static void release_gid_table(struct ib_device *device,
> > if (!table)
> > return;
> >
> > + mutex_lock(&table->lock);
> > for (i = 0; i < table->sz; i++) {
> > if (is_gid_entry_free(table->data_vec[i]))
> > continue;
> >
> > - WARN_ONCE(true,
> > - "GID entry ref leak for dev %s index %d ref=%u\n",
> > + WARN_ONCE(table->data_vec[i]->state != GID_TABLE_ENTRY_PENDING_DEL,
> > + "GID entry ref leak for dev %s index %d ref=%u, state: %d\n",
> > dev_name(&device->dev), i,
> > - kref_read(&table->data_vec[i]->kref));
> > + kref_read(&table->data_vec[i]->kref), table->data_vec[i]->state);
> > + /*
> > + * The entry may be sitting in the WQ waiting for
> > + * free_gid_work(), flush it to try to clean it.
> > + */
> > + mutex_unlock(&table->lock);
> > + flush_workqueue(ib_wq);
> > + mutex_lock(&table->lock);
>
> I can't agree with idea that flush_workqueue() is called in the loop.
Since we almost never see these WARN_ON's it isn't really called in a
loop, but sure you could put a conditional around it to do it only
once.
The WARN on is in the wrong order, it is not a kernel bug if the
workqueue is still pending. flush the queue and then check again, and
then do the warn.
@@ -791,22 +791,31 @@ static struct ib_gid_table *alloc_gid_table(int sz)
return NULL;
}
-static void release_gid_table(struct ib_device *device,
- struct ib_gid_table *table)
+static bool is_gid_table_clean(struct ib_gid_table *table)
{
int i;
+ guard(mutex)(&table->lock);
+ for (i = 0; i < table->sz; i++)
+ if (!is_gid_entry_free(table->data_vec[i]))
+ return false;
+ return true;
+}
+
+static void release_gid_table(struct ib_device *device,
+ struct ib_gid_table *table)
+{
if (!table)
return;
- for (i = 0; i < table->sz; i++) {
- if (is_gid_entry_free(table->data_vec[i]))
- continue;
-
- WARN_ONCE(true,
- "GID entry ref leak for dev %s index %d ref=%u\n",
- dev_name(&device->dev), i,
- kref_read(&table->data_vec[i]->kref));
+ if (!is_gid_table_clean(table)) {
+ /*
+ * The entry may be sitting in the WQ waiting for
+ * free_gid_work(), flush it to try to clean it.
+ */
+ flush_workqueue(ib_wq);
+ if (!is_gid_table_clean(table))
+ WARN_ONCE(true, "GID entry has leaked");
}
mutex_destroy(&table->lock);
Jason
next prev parent reply other threads:[~2025-11-05 13:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 23:36 [PATCH rdma-next v2 1/1] RDMA/core: Fix WARNING in gid_table_release_one Zhu Yanjun
2025-11-05 13:09 ` Leon Romanovsky
2025-11-05 13:45 ` Jason Gunthorpe [this message]
2025-11-05 14:54 ` Leon Romanovsky
2025-11-05 15:46 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251105134524.GL1204670@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=syzbot+b0da83a6c0e2e2bddbd4@syzkaller.appspotmail.com \
--cc=yanjun.zhu@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox