All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Daisuke Matsuda <dskmtsd@gmail.com>,
	linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca,
	zyjzyj2000@gmail.com
Cc: philipp.reisner@linbit.com
Subject: Re: [PATCH for-rc v1] RDMA/rxe: Avoid CQ polling hang triggered by CQ resize
Date: Sun, 17 Aug 2025 21:44:16 -0700	[thread overview]
Message-ID: <f764f4ae-91c2-4e22-8380-9a8dd144d0c1@linux.dev> (raw)
In-Reply-To: <20250817123752.153735-1-dskmtsd@gmail.com>

在 2025/8/17 5:37, Daisuke Matsuda 写道:
> When running the test_resize_cq testcase from rdma-core, polling a
> completion queue from userspace may occasionally hang and eventually fail
> with a timeout:
> =====
> ERROR: test_resize_cq (tests.test_cq.CQTest.test_resize_cq)
> Test resize CQ, start with specific value and then increase and decrease
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>      File "/root/deb/rdma-core/tests/test_cq.py", line 135, in test_resize_cq
>        u.poll_cq(self.client.cq)
>      File "/root/deb/rdma-core/tests/utils.py", line 687, in poll_cq
>        wcs = _poll_cq(cq, count, data)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^
>      File "/root/deb/rdma-core/tests/utils.py", line 669, in _poll_cq
>        raise PyverbsError(f'Got timeout on polling ({count} CQEs remaining)')
> pyverbs.pyverbs_error.PyverbsError: Got timeout on polling (1 CQEs
> remaining)
> =====
> 
> The issue is caused when rxe_cq_post() fails to post a CQE due to the queue
> being temporarily full, and the CQE is effectively lost. To mitigate this,
> add a bounded busy-wait with fallback rescheduling so that CQE does not get
> lost.
> 
> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_cq.c | 27 +++++++++++++++++++++++++--
>   1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
> index fffd144d509e..7b0fba63204e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_cq.c
> +++ b/drivers/infiniband/sw/rxe/rxe_cq.c
> @@ -84,14 +84,36 @@ int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe,
>   /* caller holds reference to cq */
>   int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
>   {
> +	unsigned long flags;
> +	u32 spin_cnt = 3000;
>   	struct ib_event ev;
> -	int full;
>   	void *addr;
> -	unsigned long flags;
> +	int full;
>   
>   	spin_lock_irqsave(&cq->cq_lock, flags);
>   
>   	full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
> +	if (likely(!full))
> +		goto post_queue;
> +
> +	/* constant backoff until queue is ready */
> +	while (spin_cnt--) {
> +		full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
> +		if (!full)
> +			goto post_queue;
> +
> +		cpu_relax();
> +	}

The loop runs 3000 times.
Each iteration:

Checks queue_full()
Executes cpu_relax()

On modern CPUs, each iteration may take a few cycles, e.g., 4–10 cycles 
per iteration (depends on memory/cache).

Suppose 1 cycle = ~0.3 ns on a 3 GHz CPU, 10 cycles ≈ 3 ns
3000 iterations × 10 cycles ≈ 30,000 cycles

30000 cycles * 0.3 ns = 9000 ns = 9 microseconds

So the “critical section” while spinning is tens of microseconds, not 
milliseconds.

I was concerned that 3000 iterations might make the spin lock critical 
section too long, but based on the analysis above, it appears that this 
is still a short-duration critical section.

I am not sure if it is a big spin lock critical section or not.
If it is not,

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

> +
> +	/* try giving up cpu and retry */
> +	if (full) {
> +		spin_unlock_irqrestore(&cq->cq_lock, flags);
> +		cond_resched();
> +		spin_lock_irqsave(&cq->cq_lock, flags);
> +
> +		full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
> +	}
> +
>   	if (unlikely(full)) {
>   		rxe_err_cq(cq, "queue full\n");
>   		spin_unlock_irqrestore(&cq->cq_lock, flags);
> @@ -105,6 +127,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
>   		return -EBUSY;
>   	}
>   
> + post_queue:
>   	addr = queue_producer_addr(cq->queue, QUEUE_TYPE_TO_CLIENT);
>   	memcpy(addr, cqe, sizeof(*cqe));
>   


  reply	other threads:[~2025-08-18  4:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-17 12:37 [PATCH for-rc v1] RDMA/rxe: Avoid CQ polling hang triggered by CQ resize Daisuke Matsuda
2025-08-18  4:44 ` Zhu Yanjun [this message]
2025-08-19 15:15   ` Daisuke Matsuda
2025-08-21  3:12     ` Zhu Yanjun
2025-08-23  4:19       ` Daisuke Matsuda
2025-08-23  5:22         ` Zhu Yanjun
2025-08-25 18:10 ` Jason Gunthorpe
2025-08-27 11:14   ` Daisuke Matsuda
2025-08-27 12:04     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f764f4ae-91c2-4e22-8380-9a8dd144d0c1@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=dskmtsd@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=philipp.reisner@linbit.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.