From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
To: Halil Pasic <pasic@linux.ibm.com>, Dust Li <dust.li@linux.alibaba.com>
Cc: Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
"D. Wythe" <alibuda@linux.alibaba.com>,
Sidraya Jayagond <sidraya@linux.ibm.com>,
Wenjia Zhang <wenjia@linux.ibm.com>,
Mahanta Jambigi <mjambigi@linux.ibm.com>,
Tony Lu <tonylu@linux.alibaba.com>,
Wen Gu <guwen@linux.alibaba.com>,
netdev@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-s390@vger.kernel.org
Subject: Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count configurable
Date: Wed, 24 Sep 2025 11:13:05 +0800 [thread overview]
Message-ID: <06a87a92-6cce-4a63-99d0-463a1d035478@linux.alibaba.com> (raw)
In-Reply-To: <20250919165549.7bebfbc3.pasic@linux.ibm.com>
在 2025/9/19 22:55, Halil Pasic 写道:
> On Tue, 9 Sep 2025 12:18:50 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>
> Can maybe Wen Gu and Guangguan Wang chime in. From what I read
> link->wr_rx_buflen can be either SMC_WR_BUF_SIZE that is 48 in which
> case it does not matter, or SMC_WR_BUF_V2_SIZE that is 8192, if
> !smc_link_shared_v2_rxbuf(lnk) i.e. max_recv_sge == 1. So we talk
> about roughly a factor of 170 here. For a large pref_recv_wr the
> back of logic is still there to save us but I really would not say that
> this is how this is intended to work.
>
Hi Halil,
I think the root cause of the problem this patchset try to solve is a mismatch
between SMC_WR_BUF_CNT and the max_conns per lgr(which value is 255). Furthermore,
I believe that value 255 of the max_conns per lgr is not an optimal value, as too
few connections lead to a waste of memory and too many connections lead to I/O queuing
within a single QP(every WR post_send to a single QP will initiate and complete in sequence).
We actually identified this problem long ago. In Alibaba Cloud Linux distribution, we have
changed SMC_WR_BUF_CNT to 64 and reduced max_conns per lgr to 32(for SMC-R V2.1). This
configuration has worked well under various workflow for a long time.
SMC-R V2.1 already support negotiation of the max_conns per lgr. Simply change the value of
the macro SMC_CONN_PER_LGR_PREFER can influence the negotiation result. But SMC-R V1.0 and SMC-R
v2.0 do not support the negotiation of the max_conns per lgr.
I think it is better to reduce SMC_CONN_PER_LGR_PREFER for SMC-R V2.1. But for SMC-R V1.0 and
SMC-R V2.0, I do not have any good idea.
> Maybe not supporting V2 on devices with max_recv_sge is a better choice,
> assuming that a maximal V2 LLC msg needs to fit each and every receive
> WR buffer. Which seems to be the case based on 27ef6a9981fe ("net/smc:
> support SMC-R V2 for rdma devices with max_recv_sge equals to 1").
>
For rdma dev whose max_recv_sge is 1, as metioned in the commit log in the related patch,
it is better to support than SMC_CLC_DECL_INTERR fallback, as SMC_CLC_DECL_INTERR fallback
is not a fast fallback, and may heavily influence the efficiency of the connecting process
in both the server and client side.
> For me the best course of action seems to be to send a V3 using
> link->wr_rx_buflen. I'm really not that knowledgeable about RDMA or
> the SMC-R protocol, but I'm happy to be part of the discussion on this
> matter.
>
> Regards,
> Halil
And a tiny suggestion for the risk you mentioned in commit log
("Addressing this by simply bumping SMC_WR_BUF_CNT to 256 was deemed
risky, because the large-ish physically continuous allocation could fail
and lead to TCP fall-backs."). Non-physically continuous allocation (vmalloc/vzalloc .etc.) is
also supported for wr buffers. SMC-R snd_buf and rmb have already supported for non-physically
continuous memory, when sysctl_smcr_buf_type is set to SMCR_VIRT_CONT_BUFS or SMCR_MIXED_BUFS.
It can be an example of using non-physically continuous memory.
Regards,
Guangguan Wang
next prev parent reply other threads:[~2025-09-24 3:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 22:01 [PATCH net-next v2 0/2] net/smc: make wr buffer count configurable Halil Pasic
2025-09-08 22:01 ` [PATCH net-next v2 1/2] " Halil Pasic
2025-09-09 3:00 ` Dust Li
2025-09-09 10:18 ` Halil Pasic
2025-09-19 14:55 ` Halil Pasic
2025-09-24 3:13 ` Guangguan Wang [this message]
2025-09-24 9:50 ` Halil Pasic
2025-09-25 3:48 ` Guangguan Wang
2025-09-08 22:01 ` [PATCH net-next v2 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully Halil Pasic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06a87a92-6cce-4a63-99d0-463a1d035478@linux.alibaba.com \
--to=guangguan.wang@linux.alibaba.com \
--cc=alibuda@linux.alibaba.com \
--cc=dust.li@linux.alibaba.com \
--cc=guwen@linux.alibaba.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjambigi@linux.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=sidraya@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox