Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count configurable

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Dust Li <dust.li@linux.alibaba.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	"D. Wythe" <alibuda@linux.alibaba.com>,
	Sidraya Jayagond <sidraya@linux.ibm.com>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	Mahanta Jambigi <mjambigi@linux.ibm.com>,
	Tony Lu <tonylu@linux.alibaba.com>,
	Wen Gu <guwen@linux.alibaba.com>,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-s390@vger.kernel.org
Subject: Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count configurable
Date: Thu, 25 Sep 2025 11:48:46 +0800	[thread overview]
Message-ID: <2aced457-5f1e-4c1a-b5ea-035240f73aaf@linux.alibaba.com> (raw)
In-Reply-To: <20250924115010.38d2f3cb.pasic@linux.ibm.com>



在 2025/9/24 17:50, Halil Pasic 写道:
> On Wed, 24 Sep 2025 11:13:05 +0800
> Guangguan Wang <guangguan.wang@linux.alibaba.com> wrote:
> 
>> 在 2025/9/19 22:55, Halil Pasic 写道:
>>> On Tue, 9 Sep 2025 12:18:50 +0200
>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>
>>>
>>> Can maybe Wen Gu and  Guangguan Wang chime in. From what I read
>>> link->wr_rx_buflen can be either SMC_WR_BUF_SIZE that is 48 in which
>>> case it does not matter, or SMC_WR_BUF_V2_SIZE that is 8192, if
>>> !smc_link_shared_v2_rxbuf(lnk) i.e. max_recv_sge == 1. So we talk
>>> about roughly a factor of 170 here. For a large pref_recv_wr the
>>> back of logic is still there to save us but I really would not say that
>>> this is how this is intended to work.
>>>   
>>
>> Hi Halil,
>>
>> I think the root cause of the problem this patchset try to solve is a mismatch
>> between SMC_WR_BUF_CNT and the max_conns per lgr(which value is 255). Furthermore,
>> I believe that value 255 of the max_conns per lgr is not an optimal value, as too
>> few connections lead to a waste of memory and too many connections lead to I/O queuing
>> within a single QP(every WR post_send to a single QP will initiate and complete in sequence).
>>
>> We actually identified this problem long ago. In Alibaba Cloud Linux distribution, we have
>> changed SMC_WR_BUF_CNT to 64 and reduced max_conns per lgr to 32(for SMC-R V2.1). This
>> configuration has worked well under various workflow for a long time.
>>
>> SMC-R V2.1 already support negotiation of the max_conns per lgr. Simply change the value of
>> the macro SMC_CONN_PER_LGR_PREFER can influence the negotiation result. But SMC-R V1.0 and SMC-R
>> v2.0 do not support the negotiation of the max_conns per lgr.
>> I think it is better to reduce SMC_CONN_PER_LGR_PREFER for SMC-R V2.1. But for SMC-R V1.0 and
>> SMC-R V2.0, I do not have any good idea.
>>
> 
> I agree, the number of WR buffers and the max number of connections per
> lgr can an should be tuned in concert.
> 
>>> Maybe not supporting V2 on devices with max_recv_sge is a better choice,
>>> assuming that a maximal V2 LLC msg needs to fit each and every receive
>>> WR buffer. Which seems to be the case based on 27ef6a9981fe ("net/smc:
>>> support SMC-R V2 for rdma devices with max_recv_sge equals to 1").
>>>  
>>
>> For rdma dev whose max_recv_sge is 1, as metioned in the commit log in the related patch,
>> it is better to support than SMC_CLC_DECL_INTERR fallback, as SMC_CLC_DECL_INTERR fallback
>> is not a fast fallback, and may heavily influence the efficiency of the connecting process
>> in both the server and client side.
> 
> I mean another possible mitigation of the problem can be the following,
> if there is a device in the mix with max_recv_sge < 2 the don't propose/
> accept SMCR-V2. 
> 
> Do you know how prevalent and relevant are max_recv_sge < 2 RDMA
> devices, and how likely is it that somebody would like to use SMC-R with
> such devices?
> 

eRDMA in Alibaba Cloud is max_recv_sge < 2, and it is the RDMA device we are primarily focusing on.
eRDMA prefer works on SMC-R V2.1, is it possible that supported in SMC-R V2.1 but not in V2.0? 

>>
>>  
>>> For me the best course of action seems to be to send a V3 using
>>> link->wr_rx_buflen. I'm really not that knowledgeable about RDMA or
>>> the SMC-R protocol, but I'm happy to be part of the discussion on this
>>> matter.
>>>
>>> Regards,
>>> Halil  
>>
>> And a tiny suggestion for the risk you mentioned in commit log
>> ("Addressing this by simply bumping SMC_WR_BUF_CNT to 256 was deemed
>> risky, because the large-ish physically continuous allocation could fail
>> and lead to TCP fall-backs."). Non-physically continuous allocation (vmalloc/vzalloc .etc.) is
>> also supported for wr buffers. SMC-R snd_buf and rmb have already supported for non-physically
>> continuous memory, when sysctl_smcr_buf_type is set to SMCR_VIRT_CONT_BUFS or SMCR_MIXED_BUFS.
>> It can be an example of using non-physically continuous memory.
>>
> 
> I think we can put this on the list of possible enhancements. I would
> perfer to not add this to the scope of this series. But I would be happy to
> see this happen. Don't know know if somebody form Alibaba, or maybe
> Mahanta or Sid would like to pick this up as an enhancement on top.
> > Thank you very much for for your comments!
> 
> Regards,
> Halil

next prev parent reply	other threads:[~2025-09-25  3:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 22:01 [PATCH net-next v2 0/2] net/smc: make wr buffer count configurable Halil Pasic
2025-09-08 22:01 ` [PATCH net-next v2 1/2] " Halil Pasic
2025-09-09  3:00   ` Dust Li
2025-09-09 10:18     ` Halil Pasic
2025-09-19 14:55       ` Halil Pasic
2025-09-24  3:13         ` Guangguan Wang
2025-09-24  9:50           ` Halil Pasic
2025-09-25  3:48             ` Guangguan Wang [this message]
2025-09-08 22:01 ` [PATCH net-next v2 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2aced457-5f1e-4c1a-b5ea-035240f73aaf@linux.alibaba.com \
    --to=guangguan.wang@linux.alibaba.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=guwen@linux.alibaba.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjambigi@linux.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=sidraya@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox