From: Paolo Abeni <pabeni@redhat.com>
To: alibuda@linux.alibaba.com
Cc: davem@davemloft.net, dust.li@linux.alibaba.com,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
sidraya@linux.ibm.com, wenjia@linux.ibm.com,
mjambigi@linux.ibm.com, horms@kernel.org,
tonylu@linux.alibaba.com, guwen@linux.alibaba.com,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-s390@vger.kernel.org, netdev@vger.kernel.org,
oliver.yang@linux.alibaba.com, pasic@linux.ibm.com
Subject: Re: [PATCH net-next 2/2] net/smc: reduce TX slot contention with exclusive wait
Date: Tue, 12 May 2026 10:26:43 +0200 [thread overview]
Message-ID: <20260512082643.89020-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260508063718.101622-3-alibuda@linux.alibaba.com>
From: AI Reviewer <ai@example.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net/smc: reduce TX slot contention with exclusive wait
This patch replaces the non-exclusive wait_event_interruptible_timeout()
in smc_wr_tx_get_free_slot() with an open-coded exclusive wait loop via
prepare_to_wait_exclusive(), so wake_up() on a freed TX slot wakes only
a single waiter instead of the entire wait queue.
> diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
> index 48037a3d97a3..0a6f2befb0e2 100644
> --- a/net/smc/smc_wr.c
> +++ b/net/smc/smc_wr.c
> @@ -171,17 +173,27 @@ int smc_wr_tx_get_free_slot(struct smc_link *link,
> if (rc)
> return rc;
> } else {
> - rc = wait_event_interruptible_timeout(
> - link->wr_tx_wait,
> - !smc_link_sendable(link) ||
> - lgr->terminating ||
> - (smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY),
> - SMC_WR_TX_WAIT_FREE_SLOT_TIME);
> - if (!rc) {
> - /* timeout - terminate link */
> - smcr_link_down_cond_sched(link);
> - return -EPIPE;
> + rc = 0;
> + for (;;) {
> + prepare_to_wait_exclusive(&link->wr_tx_wait, &wait,
> + TASK_INTERRUPTIBLE);
> + if (!smc_link_sendable(link) || lgr->terminating ||
> + smc_wr_tx_get_free_slot_index(link, &idx) != -EBUSY)
> + break;
> + timeout = schedule_timeout(timeout);
> + if (!timeout) {
> + /* timeout - terminate link */
> + smcr_link_down_cond_sched(link);
> + break;
> + }
> + if (signal_pending(current)) {
> + rc = -ERESTARTSYS;
> + break;
> + }
[High]
With exclusive waiters, wake_up() on link->wr_tx_wait wakes only one
task per freed slot. If that waiter is about to exit via the
signal_pending() branch, can the freed slot be lost to the remaining
sleepers?
Consider two exclusive waiters A and B asleep in schedule_timeout():
1. smc_wr_tx_put_slot() -> wake_up(&link->wr_tx_wait) is effectively
__wake_up(..., nr_exclusive=1). default_wake_function makes A
runnable, autoremove_wake_function pulls A off the queue, and
__wake_up_common stops -- B is not woken.
2. A signal is delivered to A before it actually runs. Because A is
already TASK_RUNNING, signal_wake_up only sets TIF_SIGPENDING.
3. A returns from schedule_timeout() with timeout > 0, so the
!timeout branch is skipped. signal_pending(current) is true, so
the loop sets rc = -ERESTARTSYS and breaks without re-testing
smc_wr_tx_get_free_slot_index() and without issuing any wake_up()
on the way out.
4. B stays asleep on the wait queue even though the wr_tx_mask bit
for the freed slot is still available. With no further
completions, B sleeps until the full SMC_WR_TX_WAIT_FREE_SLOT_TIME
expires, at which point smcr_link_down_cond_sched() tears down an
otherwise healthy link.
The previous wait_event_interruptible_timeout() was non-exclusive, so
every waiter was woken on each wake_up() and re-evaluated the
condition, which avoided this class of lost-wakeup.
Should the signal path re-check the slot condition before returning,
or issue a wake_up() on exit so another exclusive waiter gets a chance
at the slot?
[Medium]
The commit message says this is a contention optimization and that
"teardown behavior is unchanged", but does it also change the errno
returned to callers on signal delivery?
Before this patch, wait_event_interruptible_timeout() returned
-ERESTARTSYS on signal; that value is non-zero, so the old if (!rc)
branch was skipped, control fell through to the trailing
if (idx == link->wr_tx_cnt) return -EPIPE, and callers saw -EPIPE.
After the patch, the signal path does:
if (signal_pending(current)) {
rc = -ERESTARTSYS;
break;
}
...
finish_wait(&link->wr_tx_wait, &wait);
if (rc)
return rc;
so -ERESTARTSYS propagates up through smc_cdc_get_free_slot() (which
only remaps rc to -EPIPE when conn->killed is set),
smcr_cdc_get_slot_and_msg_send(), smc_cdc_get_slot_and_msg_send() and
smc_tx_sendmsg(), meaning userspace sees syscall restart / -EINTR
instead of -EPIPE.
Note that the sibling smc_wr_reg_send() in the same file explicitly
converts -ERESTARTSYS to -EINTR. Should this return-code change be
called out in the commit message (and possibly split out), given that
the stated intent of the patch is purely to reduce contention?
[Low]
On timeout expiry the new loop takes the teardown path without
re-testing whether a slot became available:
> + timeout = schedule_timeout(timeout);
> + if (!timeout) {
> + /* timeout - terminate link */
> + smcr_link_down_cond_sched(link);
> + break;
> + }
The old wait_event_interruptible_timeout() expansion re-evaluated the
condition via ___wait_cond_timeout after schedule_timeout() returned
0, so a wake_up() racing with timer expiry could still report success
and let smc_wr_tx_get_free_slot_index() claim the slot.
In the narrow race where a slot is freed at the same instant the
timer fires, does this now tear the link down even though a slot was
actually available?
> }
> + finish_wait(&link->wr_tx_wait, &wait);
> + if (rc)
> + return rc;
> if (idx == link->wr_tx_cnt)
> return -EPIPE;
> }
--
This is an AI-generated review.
prev parent reply other threads:[~2026-05-12 8:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 6:37 [PATCH net-next 0/2] net/smc: transition to RDMA core CQ pooling D. Wythe
2026-05-08 6:37 ` [PATCH net-next 1/2] " D. Wythe
2026-05-12 8:31 ` Paolo Abeni
2026-05-08 6:37 ` [PATCH net-next 2/2] net/smc: reduce TX slot contention with exclusive wait D. Wythe
2026-05-12 8:26 ` Paolo Abeni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512082643.89020-1-pabeni@redhat.com \
--to=pabeni@redhat.com \
--cc=alibuda@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=dust.li@linux.alibaba.com \
--cc=edumazet@google.com \
--cc=guwen@linux.alibaba.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjambigi@linux.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=oliver.yang@linux.alibaba.com \
--cc=pasic@linux.ibm.com \
--cc=sidraya@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox