Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Joe Klein <joe.klein812@gmail.com>
Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH 1/1] RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"
Date: Tue, 14 Jan 2025 10:21:19 +0100	[thread overview]
Message-ID: <0fb49f1e-e8a8-42fb-b448-2e168a8b2940@linux.dev> (raw)
In-Reply-To: <CAHjRaAeCAUw3WGjKxvFqT_5XCTut-LbnrTKgPpLshn1jmH50Pg@mail.gmail.com>

On 13.01.25 14:28, Joe Klein wrote:
> On Fri, Jan 10, 2025 at 5:09 PM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>
>> The Call Trace is as below:
>> "
>>    <TASK>
>>    ? show_regs.cold+0x1a/0x1f
>>    ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>>    ? __warn+0x84/0xd0
>>    ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>>    ? report_bug+0x105/0x180
>>    ? handle_bug+0x46/0x80
>>    ? exc_invalid_op+0x19/0x70
>>    ? asm_exc_invalid_op+0x1b/0x20
>>    ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
>>    ? __rxe_cleanup+0x124/0x170 [rdma_rxe]
>>    rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe]
>>    ib_destroy_qp_user+0x118/0x190 [ib_core]
>>    rdma_destroy_qp.cold+0x43/0x5e [rdma_cm]
>>    rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core]
>>    rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server]
>>    process_one_work+0x21d/0x3f0
>>    worker_thread+0x4a/0x3c0
>>    ? process_one_work+0x3f0/0x3f0
>>    kthread+0xf0/0x120
>>    ? kthread_complete_and_exit+0x20/0x20
>>    ret_from_fork+0x22/0x30
>>    </TASK>
>> "
>> When too many rdma resources are allocated, rxe needs more time to
>> handle these rdma resources. Sometimes with the current timeout, rxe
>> can not release the rdma resources correctly.
>>
>> Compared with other rdma drivers, a bigger timeout is used.
>>
>> Fixes: 215d0a755e1b ("RDMA/rxe: Stop lookup of partially built objects")
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> We tested this patch. All the tests can pass with this patch.


Thanks a lot. Appreciate your testing.

Zhu Yanjun

> 
> Tested-by: Joe Klein <joe.klein812@gmail.com>
> 
>> ---
>>   drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++------
>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index 67567d62195e..d9cb682fd71f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -178,7 +178,6 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>>   {
>>          struct rxe_pool *pool = elem->pool;
>>          struct xarray *xa = &pool->xa;
>> -       static int timeout = RXE_POOL_TIMEOUT;
>>          int ret, err = 0;
>>          void *xa_ret;
>>
>> @@ -202,19 +201,19 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>>           * return to rdma-core
>>           */
>>          if (sleepable) {
>> -               if (!completion_done(&elem->complete) && timeout) {
>> +               if (!completion_done(&elem->complete)) {
>>                          ret = wait_for_completion_timeout(&elem->complete,
>> -                                       timeout);
>> +                                       msecs_to_jiffies(50000));
>>
>>                          /* Shouldn't happen. There are still references to
>>                           * the object but, rather than deadlock, free the
>>                           * object or pass back to rdma-core.
>>                           */
>>                          if (WARN_ON(!ret))
>> -                               err = -EINVAL;
>> +                               err = -ETIMEDOUT;
>>                  }
>>          } else {
>> -               unsigned long until = jiffies + timeout;
>> +               unsigned long until = jiffies + RXE_POOL_TIMEOUT;
>>
>>                  /* AH objects are unique in that the destroy_ah verb
>>                   * can be called in atomic context. This delay
>> @@ -226,7 +225,7 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
>>                          mdelay(1);
>>
>>                  if (WARN_ON(!completion_done(&elem->complete)))
>> -                       err = -EINVAL;
>> +                       err = -ETIMEDOUT;
>>          }
>>
>>          if (pool->cleanup)
>> --
>> 2.34.1
>>
>>


  reply	other threads:[~2025-01-14  9:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 16:09 [PATCH 1/1] RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" Zhu Yanjun
2025-01-13 13:28 ` Joe Klein
2025-01-14  9:21   ` Zhu Yanjun [this message]
2025-01-14 11:44 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0fb49f1e-e8a8-42fb-b448-2e168a8b2940@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joe.klein812@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox