* [PATCH] IB/cache: Check GID table references before attempting deletion
@ 2026-05-13 8:07 Chenguang Zhao
2026-05-17 10:32 ` Leon Romanovsky
0 siblings, 1 reply; 4+ messages in thread
From: Chenguang Zhao @ 2026-05-13 8:07 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Chenguang Zhao, Kees Cook, Etienne AUJAMES, zhenwei pi,
Jiri Pirko, Maor Gottlieb, linux-rdma
In the NFS over RDMA environment, repeatedly performing frequent
ifdown/ifup operations on the client may cause df -h to hang.
The kernel log reports an error:
__ib_cache_gid_add: unable to add gid
0000:0000:0000:0000:0000:ffff:c0a8:0115 error=-28.
Error code -28 indicates the GID table is full.
The call stack during ifdown is as follows:
put_gid_entry_locked()
del_gid()
_ib_cache_gid_del()
update_gid()
update_gid_event_work_handler()
In put_gid_entry_locked(), kref_put(&entry->kref) does not
drop the reference count to zero, so free_gid_entry()
is never invoked to release the entry. Subsequent ifup
attempts keep adding new entries into the GID table,
eventually exhausting the table capacity.
To fix this, check whether the GID entry still has
outstanding references in del_gid(), and only remove
and release the entry when no other references remain.
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
---
drivers/infiniband/core/cache.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 647a547e2d7f..c71522fbf89f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -596,6 +596,34 @@ int ib_cache_gid_add(struct ib_device *ib_dev, u32 port,
return __ib_cache_gid_add(ib_dev, port, gid, attr, mask, false);
}
+/**
+ * gid_table_is_shared - Check if GID table has other reference owners
+ * @table: GID table to check
+ * @ix: index of entry
+ *
+ * Returns true if the gid table refcount is greater than 1,
+ */
+static bool gid_table_is_shared(struct ib_gid_table *table, int ix)
+{
+ unsigned int refcount;
+ struct ib_gid_table_entry *entry;
+
+ write_lock_irq(&table->rwlock);
+
+ entry = table->data_vec[ix];
+ refcount = kref_read(&entry->kref);
+
+ write_unlock_irq(&table->rwlock);
+
+ if (refcount > 1) {
+ pr_debug("%s: The GID table is still referenced and cannot be deleted.\n",
+ __func__);
+ return true;
+ } else {
+ return false;
+ }
+}
+
static int
_ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
union ib_gid *gid, struct ib_gid_attr *attr,
@@ -615,6 +643,9 @@ _ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
goto out_unlock;
}
+ if (gid_table_is_shared(table, ix))
+ goto out_unlock;
+
del_gid(ib_dev, port, table, ix);
dispatch_gid_change_event(ib_dev, port);
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] IB/cache: Check GID table references before attempting deletion
2026-05-13 8:07 [PATCH] IB/cache: Check GID table references before attempting deletion Chenguang Zhao
@ 2026-05-17 10:32 ` Leon Romanovsky
2026-05-18 2:36 ` Chenguang Zhao
0 siblings, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2026-05-17 10:32 UTC (permalink / raw)
To: Chenguang Zhao
Cc: Jason Gunthorpe, Kees Cook, Etienne AUJAMES, zhenwei pi,
Jiri Pirko, Maor Gottlieb, linux-rdma
On Wed, May 13, 2026 at 04:07:07PM +0800, Chenguang Zhao wrote:
> In the NFS over RDMA environment, repeatedly performing frequent
> ifdown/ifup operations on the client may cause df -h to hang.
> The kernel log reports an error:
> __ib_cache_gid_add: unable to add gid
> 0000:0000:0000:0000:0000:ffff:c0a8:0115 error=-28.
> Error code -28 indicates the GID table is full.
> The call stack during ifdown is as follows:
> put_gid_entry_locked()
> del_gid()
> _ib_cache_gid_del()
> update_gid()
> update_gid_event_work_handler()
>
> In put_gid_entry_locked(), kref_put(&entry->kref) does not
> drop the reference count to zero.
Why?
> so free_gid_entry() is never invoked to release the entry. Subsequent ifup
> attempts keep adding new entries into the GID table,
> eventually exhausting the table capacity.
This behavior is not what we expect from the IB/cache layer.
Thanks
>
> To fix this, check whether the GID entry still has
> outstanding references in del_gid(), and only remove
> and release the entry when no other references remain.
>
> Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
> ---
> drivers/infiniband/core/cache.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index 647a547e2d7f..c71522fbf89f 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -596,6 +596,34 @@ int ib_cache_gid_add(struct ib_device *ib_dev, u32 port,
> return __ib_cache_gid_add(ib_dev, port, gid, attr, mask, false);
> }
>
> +/**
> + * gid_table_is_shared - Check if GID table has other reference owners
> + * @table: GID table to check
> + * @ix: index of entry
> + *
> + * Returns true if the gid table refcount is greater than 1,
> + */
> +static bool gid_table_is_shared(struct ib_gid_table *table, int ix)
> +{
> + unsigned int refcount;
> + struct ib_gid_table_entry *entry;
> +
> + write_lock_irq(&table->rwlock);
> +
> + entry = table->data_vec[ix];
> + refcount = kref_read(&entry->kref);
> +
> + write_unlock_irq(&table->rwlock);
> +
> + if (refcount > 1) {
> + pr_debug("%s: The GID table is still referenced and cannot be deleted.\n",
> + __func__);
> + return true;
> + } else {
> + return false;
> + }
> +}
> +
> static int
> _ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
> union ib_gid *gid, struct ib_gid_attr *attr,
> @@ -615,6 +643,9 @@ _ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
> goto out_unlock;
> }
>
> + if (gid_table_is_shared(table, ix))
> + goto out_unlock;
> +
> del_gid(ib_dev, port, table, ix);
> dispatch_gid_change_event(ib_dev, port);
>
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] IB/cache: Check GID table references before attempting deletion
2026-05-17 10:32 ` Leon Romanovsky
@ 2026-05-18 2:36 ` Chenguang Zhao
2026-05-18 17:58 ` Jason Gunthorpe
0 siblings, 1 reply; 4+ messages in thread
From: Chenguang Zhao @ 2026-05-18 2:36 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Kees Cook, Etienne AUJAMES, zhenwei pi,
Jiri Pirko, Maor Gottlieb, linux-rdma
After calling kref_put(&entry->kref) in put_gid_entry_locked(), the reference count does not drop to zero.
This is because the GID entry is still held by NFS via call paths such as
cma_acquire_dev_by_src_ip() -> cma_validate_port() -> rdma_find_gid_by_port() -> get_gid_entry().
Consequently, the GID entry cannot be freed. Meanwhile, the corresponding GID has already been removed
from hardware/driver layer via ib_dev->ops.del_gid(). Subsequent ifup attempts keep inserting new entries
into the GID table, and repeated cycles of ifdown and ifup eventually exhaust the entire GID table space.
To resolve this issue, we add a check before removing GID entries in the driver. We forbid the deletion
operation if entry->kref is not equal to 1 (the initial reference count value). With this constraint, existing
valid entries will be detected and reused after ifup, avoiding redundant insertion into the GID table.
Will GID entry deletion lead to inconsistency between driver and IB/cache layers?
Thanks
Chenguang
在 2026/5/17 18:32, Leon Romanovsky 写道:
> On Wed, May 13, 2026 at 04:07:07PM +0800, Chenguang Zhao wrote:
>> In the NFS over RDMA environment, repeatedly performing frequent
>> ifdown/ifup operations on the client may cause df -h to hang.
>> The kernel log reports an error:
>> __ib_cache_gid_add: unable to add gid
>> 0000:0000:0000:0000:0000:ffff:c0a8:0115 error=-28.
>> Error code -28 indicates the GID table is full.
>> The call stack during ifdown is as follows:
>> put_gid_entry_locked()
>> del_gid()
>> _ib_cache_gid_del()
>> update_gid()
>> update_gid_event_work_handler()
>>
>> In put_gid_entry_locked(), kref_put(&entry->kref) does not
>> drop the reference count to zero.
> Why?
>
>> so free_gid_entry() is never invoked to release the entry. Subsequent ifup
>> attempts keep adding new entries into the GID table,
>> eventually exhausting the table capacity.
> This behavior is not what we expect from the IB/cache layer.
>
> Thanks
>
>> To fix this, check whether the GID entry still has
>> outstanding references in del_gid(), and only remove
>> and release the entry when no other references remain.
>>
>> Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
>> ---
>> drivers/infiniband/core/cache.c | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
>> index 647a547e2d7f..c71522fbf89f 100644
>> --- a/drivers/infiniband/core/cache.c
>> +++ b/drivers/infiniband/core/cache.c
>> @@ -596,6 +596,34 @@ int ib_cache_gid_add(struct ib_device *ib_dev, u32 port,
>> return __ib_cache_gid_add(ib_dev, port, gid, attr, mask, false);
>> }
>>
>> +/**
>> + * gid_table_is_shared - Check if GID table has other reference owners
>> + * @table: GID table to check
>> + * @ix: index of entry
>> + *
>> + * Returns true if the gid table refcount is greater than 1,
>> + */
>> +static bool gid_table_is_shared(struct ib_gid_table *table, int ix)
>> +{
>> + unsigned int refcount;
>> + struct ib_gid_table_entry *entry;
>> +
>> + write_lock_irq(&table->rwlock);
>> +
>> + entry = table->data_vec[ix];
>> + refcount = kref_read(&entry->kref);
>> +
>> + write_unlock_irq(&table->rwlock);
>> +
>> + if (refcount > 1) {
>> + pr_debug("%s: The GID table is still referenced and cannot be deleted.\n",
>> + __func__);
>> + return true;
>> + } else {
>> + return false;
>> + }
>> +}
>> +
>> static int
>> _ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
>> union ib_gid *gid, struct ib_gid_attr *attr,
>> @@ -615,6 +643,9 @@ _ib_cache_gid_del(struct ib_device *ib_dev, u32 port,
>> goto out_unlock;
>> }
>>
>> + if (gid_table_is_shared(table, ix))
>> + goto out_unlock;
>> +
>> del_gid(ib_dev, port, table, ix);
>> dispatch_gid_change_event(ib_dev, port);
>>
>> --
>> 2.25.1
>>
>>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] IB/cache: Check GID table references before attempting deletion
2026-05-18 2:36 ` Chenguang Zhao
@ 2026-05-18 17:58 ` Jason Gunthorpe
0 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2026-05-18 17:58 UTC (permalink / raw)
To: Chenguang Zhao
Cc: Leon Romanovsky, Kees Cook, Etienne AUJAMES, zhenwei pi,
Jiri Pirko, Maor Gottlieb, linux-rdma
On Mon, May 18, 2026 at 10:36:57AM +0800, Chenguang Zhao wrote:
> After calling kref_put(&entry->kref) in put_gid_entry_locked(), the reference count does not drop to zero.
> This is because the GID entry is still held by NFS via call paths such as
> cma_acquire_dev_by_src_ip() -> cma_validate_port() -> rdma_find_gid_by_port() -> get_gid_entry().
> Consequently, the GID entry cannot be freed. Meanwhile, the corresponding GID has already been removed
> from hardware/driver layer via ib_dev->ops.del_gid(). Subsequent ifup attempts keep inserting new entries
> into the GID table, and repeated cycles of ifdown and ifup eventually exhaust the entire GID table space.
This is a bug in NFS/etc to hold on to GID entries forever.
> To resolve this issue, we add a check before removing GID entries in the driver. We forbid the deletion
> operation if entry->kref is not equal to 1 (the initial reference count value). With this constraint, existing
> valid entries will be detected and reused after ifup, avoiding redundant insertion into the GID table.
Definately no, you cannot keep GID entries alive that are removed from
the netdev.
Jason
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-18 17:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 8:07 [PATCH] IB/cache: Check GID table references before attempting deletion Chenguang Zhao
2026-05-17 10:32 ` Leon Romanovsky
2026-05-18 2:36 ` Chenguang Zhao
2026-05-18 17:58 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox