From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80A3F2765C0 for ; Tue, 19 Aug 2025 15:15:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755616558; cv=none; b=KQ8c1c+/A1h3FvsRyPkld89wnbcwj2FXJqh2XjILlB7igBfwKG+YzbaZTaVzJXy6XxIYuJSU/obFqKIBtC28xEYTHJwoGBmHRrlViB7suuTaB8nR+p1MBvFmArrgu5jsQY2DG9WZJ9Sxz5jr7LiIYvMLDcGvLnSTdU4JU1uYDUA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755616558; c=relaxed/simple; bh=/YXnqVuK8U4gBXC+dMsuRDaeL1UwB3gwONUyZmb/PzM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=UkXCQYoMQAiV18v2MiSYSFp7lt32iR+xgXU8A1LiUwab20TQhROwARXFjBxVR1mmgZZi2gxvxHjNvtGxzAF/5gdvxg++GQMPtuGaqU1FK+BDA0z+aPQDZzaxjwFXQbB1GwPTQPGRatSB6kK5THbjSvbzEKDg/TJnS8gozGpksRU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eJICgpo8; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eJICgpo8" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-32372c05b79so2190552a91.0 for ; Tue, 19 Aug 2025 08:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755616556; x=1756221356; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=g1vXWOG6u1Oib3gcXHgFe/NxeA0k1BMk9PSmX9WxRMA=; b=eJICgpo87CBZ8TZYP++9cFNROwV2rZG0hXg+OoozqS5uxmAi1TSbTYnDCj5Am9TTmG 0858aSxm5HUlqCErXbWhNElvfVA6Ep0pId5W/B0QZ+IE4o/gAtnFzEq+9Ic24142hLz8 4A1VM43u4YVAnu2su1zLKLT4co+yopgXOERey1ux1SWXR8AK3KoyYkhfX9neB24Wnlhv hKykHfbwoen7e5bAS9JSG0vdloe6ovH1mBdpQiC4RT89uamsOycREBnYch2EQfW6Eff7 sVfChez8wA0/zXu0zwVFwSGUbRmGvbZSDQR9J4Luhw4tRVngIkldDBp6Nfvi//mHxFx5 8V/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755616556; x=1756221356; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=g1vXWOG6u1Oib3gcXHgFe/NxeA0k1BMk9PSmX9WxRMA=; b=lGZKSmqiGM2/2tOAfiV6yd2nWMQbD75US/RXGDt8I5CYWOHOFhu3lIqSFt4+olTgkI jvuev+sxZXC0A4QheWdAvx/VXy16wM04EwyFzsDDrnqM86Sv5t+qZtoBf9Ic3tK2vvVh F5FqN/JxQIwYs1bOdYXZzusofHG07dlzkP3EblhJ6YC2j4J5k0FrxABMk39jt0g4limC vdaG5uOfxlIBFjLKxO7aWdybS6UwaZAxcw6yVB89ZOmhTthoWbPjqXRyFvHqR7O71OYg wxkj+2SuZ3Y7olqpZZNVnLfkxJUXzAwPmqiGh3Ud86e7A3a5FaT6+37o5Wh2zUrnSlxu LguQ== X-Forwarded-Encrypted: i=1; AJvYcCXDRkudOuFUoiT/MuE41pd+PGw0qo43pJiyUP3LLiLJUylGuPJ9rK1cKQW3tVU0FkmBEwWrddcSDH4w@vger.kernel.org X-Gm-Message-State: AOJu0YyLYUFPHkMocvqF2fGkYAtz7CQYy8VRggWSS5wgR0sgoNgedQ+q U+fSd6EWEDK2301bbwnfO/QEO7N14MumgaP6VW7YmhwxAC9t5Ds6+XfK X-Gm-Gg: ASbGnct3no3/mvCQUV6p7dZQjXOlbQtyfggJIzxOS2w0GVjlTQEe25En4J+vzZiuKq8 RF6bSRgt3vk/NL9iN4/pUbu8fEAZSq8YP/zrFbSKwL+KhiA5BfpC/E90fJxnY2lDO+pT0/l5oNX qQoh2PDxrQ7J7XXfN//AMs0T/qa69y+gCgjcKyiBEf0KjiN9TGAL1ln8sm6H1JR8clyqxiYbTCo UzjFNk6i1OkEoA5azlRu/1i7HwEHOfJP4CqfQxD7Vrwif2rQja3VnpVWBi7D0DNxHvtbgbTF51W gyr7Ysg4AKTwXJP8i2/fnjdIgH6EbIhgDREawoPqapeaFSNVx36MmB4gVLg/f5PsVKAi/ey2q5S 4Hlz+CRIRvnYvKVljL667um/kyOEI9cz8EqaCyhseHKWuFCkPx7VGpHaao4p9FiqrPsHpZ24FGh PQ X-Google-Smtp-Source: AGHT+IGCIMGYVzz86HcLWYbmv+1tTXX//56RCya3uXijrtZgP97h+FDU7Na7Fi8miIcud8peJkdbkQ== X-Received: by 2002:a17:90b:2d48:b0:323:7e05:5a2c with SMTP id 98e67ed59e1d1-32476b01fcamr3679050a91.36.1755616555444; Tue, 19 Aug 2025 08:15:55 -0700 (PDT) Received: from [192.168.11.3] (FL1-125-195-176-151.tky.mesh.ad.jp. [125.195.176.151]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32330f836cfsm14666838a91.5.2025.08.19.08.15.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Aug 2025 08:15:55 -0700 (PDT) Message-ID: <4a2b6587-7bb1-4fdd-a3c1-6f0c61a84ef7@gmail.com> Date: Wed, 20 Aug 2025 00:15:51 +0900 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH for-rc v1] RDMA/rxe: Avoid CQ polling hang triggered by CQ resize To: Zhu Yanjun , linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: philipp.reisner@linbit.com References: <20250817123752.153735-1-dskmtsd@gmail.com> Content-Language: en-US From: Daisuke Matsuda In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2025/08/18 13:44, Zhu Yanjun wrote: > 在 2025/8/17 5:37, Daisuke Matsuda 写道: >> When running the test_resize_cq testcase from rdma-core, polling a >> completion queue from userspace may occasionally hang and eventually fail >> with a timeout: >> ===== >> ERROR: test_resize_cq (tests.test_cq.CQTest.test_resize_cq) >> Test resize CQ, start with specific value and then increase and decrease >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >>      File "/root/deb/rdma-core/tests/test_cq.py", line 135, in test_resize_cq >>        u.poll_cq(self.client.cq) >>      File "/root/deb/rdma-core/tests/utils.py", line 687, in poll_cq >>        wcs = _poll_cq(cq, count, data) >>              ^^^^^^^^^^^^^^^^^^^^^^^^^ >>      File "/root/deb/rdma-core/tests/utils.py", line 669, in _poll_cq >>        raise PyverbsError(f'Got timeout on polling ({count} CQEs remaining)') >> pyverbs.pyverbs_error.PyverbsError: Got timeout on polling (1 CQEs >> remaining) >> ===== >> >> The issue is caused when rxe_cq_post() fails to post a CQE due to the queue >> being temporarily full, and the CQE is effectively lost. To mitigate this, >> add a bounded busy-wait with fallback rescheduling so that CQE does not get >> lost. >> >> Signed-off-by: Daisuke Matsuda >> --- >>   drivers/infiniband/sw/rxe/rxe_cq.c | 27 +++++++++++++++++++++++++-- >>   1 file changed, 25 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c >> index fffd144d509e..7b0fba63204e 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_cq.c >> +++ b/drivers/infiniband/sw/rxe/rxe_cq.c >> @@ -84,14 +84,36 @@ int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe, >>   /* caller holds reference to cq */ >>   int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited) >>   { >> +    unsigned long flags; >> +    u32 spin_cnt = 3000; >>       struct ib_event ev; >> -    int full; >>       void *addr; >> -    unsigned long flags; >> +    int full; >>       spin_lock_irqsave(&cq->cq_lock, flags); >>       full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT); >> +    if (likely(!full)) >> +        goto post_queue; >> + >> +    /* constant backoff until queue is ready */ >> +    while (spin_cnt--) { >> +        full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT); >> +        if (!full) >> +            goto post_queue; >> + >> +        cpu_relax(); >> +    } > > The loop runs 3000 times. > Each iteration: > > Checks queue_full() > Executes cpu_relax() > > On modern CPUs, each iteration may take a few cycles, e.g., 4–10 cycles per iteration (depends on memory/cache). > > Suppose 1 cycle = ~0.3 ns on a 3 GHz CPU, 10 cycles ≈ 3 ns > 3000 iterations × 10 cycles ≈ 30,000 cycles > > 30000 cycles * 0.3 ns = 9000 ns = 9 microseconds > > So the “critical section” while spinning is tens of microseconds, not milliseconds. > > I was concerned that 3000 iterations might make the spin lock critical section too long, but based on the analysis above, it appears that this is still a short-duration critical section. Thank you for the review. Assuming the two loads in queue_full() hit in the L1 cache, I estimate each iteration could take around 15–20 cycles. Based on your calculation, the maximum total time would be approximately 18 microseconds. > > I am not sure if it is a big spin lock critical section or not. > If it is not, In my opinion, this duration is acceptable, as the thread does not actually spin for that long in practice. During my testing, it never reached the cond_resched() fallback, so the current spin count appears sufficient to avoid the failure case. Thanks, Daisuke > > Reviewed-by: Zhu Yanjun > > Zhu Yanjun > >> + >> +    /* try giving up cpu and retry */ >> +    if (full) { >> +        spin_unlock_irqrestore(&cq->cq_lock, flags); >> +        cond_resched(); >> +        spin_lock_irqsave(&cq->cq_lock, flags); >> + >> +        full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT); >> +    } >> + >>       if (unlikely(full)) { >>           rxe_err_cq(cq, "queue full\n"); >>           spin_unlock_irqrestore(&cq->cq_lock, flags); >> @@ -105,6 +127,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited) >>           return -EBUSY; >>       } >> + post_queue: >>       addr = queue_producer_addr(cq->queue, QUEUE_TYPE_TO_CLIENT); >>       memcpy(addr, cqe, sizeof(*cqe)); >