From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Joe Klein <joe.klein812@gmail.com>
Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
linux-rdma@vger.kernel.org
Subject: Re: [PATCH 1/1] RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"
Date: Tue, 14 Jan 2025 10:21:19 +0100 [thread overview]
Message-ID: <0fb49f1e-e8a8-42fb-b448-2e168a8b2940@linux.dev> (raw)
In-Reply-To: <CAHjRaAeCAUw3WGjKxvFqT_5XCTut-LbnrTKgPpLshn1jmH50Pg@mail.gmail.com>
On 13.01.25 14:28, Joe Klein wrote:
> On Fri, Jan 10, 2025 at 5:09 PM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>
>> The Call Trace is as below:
>> "
>> <TASK>
>> ? show_regs.cold+0x1a/0x1f
>> ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>> ? __warn+0x84/0xd0
>> ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>> ? report_bug+0x105/0x180
>> ? handle_bug+0x46/0x80
>> ? exc_invalid_op+0x19/0x70
>> ? asm_exc_invalid_op+0x1b/0x20
>> ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>> ? __rxe_cleanup+0x124/0x170 [rdma_rxe]
>> rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe]
>> ib_destroy_qp_user+0x118/0x190 [ib_core]
>> rdma_destroy_qp.cold+0x43/0x5e [rdma_cm]
>> rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core]
>> rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server]
>> process_one_work+0x21d/0x3f0
>> worker_thread+0x4a/0x3c0
>> ? process_one_work+0x3f0/0x3f0
>> kthread+0xf0/0x120
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x22/0x30
>> </TASK>
>> "
>> When too many rdma resources are allocated, rxe needs more time to
>> handle these rdma resources. Sometimes with the current timeout, rxe
>> can not release the rdma resources correctly.
>>
>> Compared with other rdma drivers, a bigger timeout is used.
>>
>> Fixes: 215d0a755e1b ("RDMA/rxe: Stop lookup of partially built objects")
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> We tested this patch. All the tests can pass with this patch.
Thanks a lot. Appreciate your testing.
Zhu Yanjun
>
> Tested-by: Joe Klein <joe.klein812@gmail.com>
>
>> ---
>> drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++------
>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index 67567d62195e..d9cb682fd71f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -178,7 +178,6 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>> {
>> struct rxe_pool *pool = elem->pool;
>> struct xarray *xa = &pool->xa;
>> - static int timeout = RXE_POOL_TIMEOUT;
>> int ret, err = 0;
>> void *xa_ret;
>>
>> @@ -202,19 +201,19 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>> * return to rdma-core
>> */
>> if (sleepable) {
>> - if (!completion_done(&elem->complete) && timeout) {
>> + if (!completion_done(&elem->complete)) {
>> ret = wait_for_completion_timeout(&elem->complete,
>> - timeout);
>> + msecs_to_jiffies(50000));
>>
>> /* Shouldn't happen. There are still references to
>> * the object but, rather than deadlock, free the
>> * object or pass back to rdma-core.
>> */
>> if (WARN_ON(!ret))
>> - err = -EINVAL;
>> + err = -ETIMEDOUT;
>> }
>> } else {
>> - unsigned long until = jiffies + timeout;
>> + unsigned long until = jiffies + RXE_POOL_TIMEOUT;
>>
>> /* AH objects are unique in that the destroy_ah verb
>> * can be called in atomic context. This delay
>> @@ -226,7 +225,7 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>> mdelay(1);
>>
>> if (WARN_ON(!completion_done(&elem->complete)))
>> - err = -EINVAL;
>> + err = -ETIMEDOUT;
>> }
>>
>> if (pool->cleanup)
>> --
>> 2.34.1
>>
>>
next prev parent reply other threads:[~2025-01-14 9:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 16:09 [PATCH 1/1] RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" Zhu Yanjun
2025-01-13 13:28 ` Joe Klein
2025-01-14 9:21 ` Zhu Yanjun [this message]
2025-01-14 11:44 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0fb49f1e-e8a8-42fb-b448-2e168a8b2940@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=jgg@ziepe.ca \
--cc=joe.klein812@gmail.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox