From: Dust Li <dust.li@linux.alibaba.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Paolo Abeni <pabeni@redhat.com>, Jakub Kicinski <kuba@kernel.org>,
Simon Horman <horms@kernel.org>,
"D. Wythe" <alibuda@linux.alibaba.com>,
Sidraya Jayagond <sidraya@linux.ibm.com>,
Wenjia Zhang <wenjia@linux.ibm.com>,
Mahanta Jambigi <mjambigi@linux.ibm.com>,
Tony Lu <tonylu@linux.alibaba.com>,
Wen Gu <guwen@linux.alibaba.com>,
netdev@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-s390@vger.kernel.org
Subject: Re: [PATCH net-next v3 1/2] net/smc: make wr buffer count configurable
Date: Sun, 28 Sep 2025 19:42:54 +0800 [thread overview]
Message-ID: <aNkfPqTyQxYTusKw@linux.alibaba.com> (raw)
In-Reply-To: <20250928103951.6464dfd3.pasic@linux.ibm.com>
On 2025-09-28 10:39:51, Halil Pasic wrote:
>On Sun, 28 Sep 2025 10:02:43 +0800
>Dust Li <dust.li@linux.alibaba.com> wrote:
>
>> >Unfortunately I don't quite understand why qp_attr.cap.max_send_wr is 3
>> >times the number of send WR buffers we allocate. My understanding
>> >is that qp_attr.cap.max_send_wr is about the number of send WQEs.
>>
>> We have at most 2 RDMA Write for 1 RDMA send. So 3 times is necessary.
>> That is explained in the original comments. Maybe it's better to keep it.
>>
>> ```
>> .cap = {
>> /* include unsolicited rdma_writes as well,
>> * there are max. 2 RDMA_WRITE per 1 WR_SEND
>> */
>
>But what are "the unsolicited" rdma_writes? I have heard of
>unsolicited receive, where the data is received without
>consuming a WR previously put on the RQ on the receiving end, but
>the concept of unsolicited rdma_writes eludes me completely.
unsolicited RDMA Writes means those RDMA Writes won't generate
CQEs on the local side. You can refer to:
https://www.rdmamojo.com/2014/05/27/solicited-event/
>
>I guess what you are trying to say, and what I understand is
>that we first put the payload into the RMB of the remote, which
>may require up 2 RDMA_WRITE operations, probably because we may
>cross the end (and start) of the array that hosts the circular
>buffer, and then we send a CDC message to update the cursor.
>
>For the latter a ib_post_send() is used in smc_wr_tx_send()
>and AFAICT it consumes a WR from wr_tx_bufs. For the former
>we consume a single wr_tx_rdmas which and each wr_tx_rdmas
>has 2 WR allocated.
Right.
>
>And all those WRs need a WQE. So I guess now I do understand
>SMC_WR_BUF_CNT, but I find the comment still confusing like
>hell because of these unsolicited rdma_writes.
>
>Thank you for the explanation! It was indeed helpful! Let
>me try to come up with a better comment -- unless somebody
>manages to explain "unsolicited rdma_writes" to me.
>
>> .max_send_wr = SMC_WR_BUF_CNT * 3,
>> .max_recv_wr = SMC_WR_BUF_CNT * 3,
>> .max_send_sge = SMC_IB_MAX_SEND_SGE,
>> .max_recv_sge = lnk->wr_rx_sge_cnt,
>> .max_inline_data = 0,
>> },
>> ```
>>
>> >I assume that qp_attr.cap.max_send_wr == qp_attr.cap.max_recv_wr
>> >is not something we would want to preserve.
>>
>> IIUC, RDMA Write won't consume any RX wqe on the receive side, so I think
>> the .max_recv_wr can be SMC_WR_BUF_CNT if we don't use RDMA_WRITE_IMM.
>
>Maybe we don't want to assume somebody else (another implementation)
>would not use immediate data. I'm not sure. But I don't quite understand
>the why the relationship between the send and the receive side either.
I missed something here. I sent an other email right after this to
explain my thoughts here:
I kept thinking about this a bit more, and I realized that max_recv_wr
should be larger than SMC_WR_BUF_CNT.
Since receive WQEs are posted in a softirq context, their posting may be
delayed. Meanwhile, the sender might already have received the TX
completion (CQE) and continue sending new messages. In this case, if the
receiver’s post_recv() (i.e., posting of RX WQEs) is delayed, an RNR
(Receiver Not Ready) can easily occur.
Best regards,
Dust
next prev parent reply other threads:[~2025-09-28 11:43 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-21 21:44 [PATCH net-next v3 0/2] net/smc: make wr buffer count configurable Halil Pasic
2025-09-21 21:44 ` [PATCH net-next v3 1/2] " Halil Pasic
2025-09-24 17:27 ` Sidraya Jayagond
2025-09-25 9:27 ` Paolo Abeni
2025-09-25 11:25 ` Halil Pasic
2025-09-27 22:55 ` Halil Pasic
2025-09-28 2:02 ` Dust Li
2025-09-28 2:12 ` Dust Li
2025-09-28 8:39 ` Halil Pasic
2025-09-28 11:42 ` Dust Li [this message]
2025-09-28 18:32 ` Halil Pasic
2025-09-26 2:44 ` Guangguan Wang
2025-09-26 10:12 ` Halil Pasic
2025-09-26 10:30 ` Halil Pasic
2025-09-28 3:05 ` Guangguan Wang
2025-09-21 21:44 ` [PATCH net-next v3 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully Halil Pasic
2025-09-24 17:28 ` Sidraya Jayagond
2025-09-25 9:40 ` Paolo Abeni
2025-09-25 15:05 ` Halil Pasic
2025-09-25 15:41 ` Paolo Abeni
2025-09-25 21:46 ` Halil Pasic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNkfPqTyQxYTusKw@linux.alibaba.com \
--to=dust.li@linux.alibaba.com \
--cc=alibuda@linux.alibaba.com \
--cc=guwen@linux.alibaba.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjambigi@linux.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=sidraya@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.