All of lore.kernel.org
 help / color / mirror / Atom feed
From: "dust.li" <dust.li@linux.alibaba.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>,
	Karsten Graul <kgraul@linux.ibm.com>,
	Tony Lu <tonylu@linux.alibaba.com>
Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next v2] net/smc: Add autocork support
Date: Wed, 16 Feb 2022 23:39:02 +0800	[thread overview]
Message-ID: <20220216153902.GC39286@linux.alibaba.com> (raw)
In-Reply-To: <c113554f9d3cdfbf3e148cc3400e106ba7bdb3c4.camel@linux.ibm.com>

On Wed, Feb 16, 2022 at 04:20:27PM +0100, Niklas Schnelle wrote:
>On Wed, 2022-02-16 at 20:00 +0800, Dust Li wrote:
>> This patch adds autocork support for SMC which could improve
>> throughput for small message by x2 ~ x4.
>> 
>> The main idea is borrowed from TCP autocork with some RDMA
>> specific modification:
>> 1. The first message should never cork to make sure we won't
>>    bring extra latency
>> 2. If we have posted any Tx WRs to the NIC that have not
>>    completed, cork the new messages until:
>>    a) Receive CQE for the last Tx WR
>>    b) We have corked enough message on the connection
>> 3. Try to push the corked data out when we receive CQE of
>>    the last Tx WR to prevent the corked messages hang in
>>    the send queue.
>> 
>> Both SMC autocork and TCP autocork check the TX completion
>> to decide whether we should cork or not. The difference is
>> when we got a SMC Tx WR completion, the data have been confirmed
>> by the RNIC while TCP TX completion just tells us the data
>> have been sent out by the local NIC.
>> 
>> Add an atomic variable tx_pushing in smc_connection to make
>> sure only one can send to let it cork more and save CDC slot.
>> 
>> SMC autocork should not bring extra latency since the first
>> message will always been sent out immediately.
>> 
>> The qperf tcp_bw test shows more than x4 increase under small
>> message size with Mellanox connectX4-Lx, same result with other
>> throughput benchmarks like sockperf/netperf.
>> The qperf tcp_lat test shows SMC autocork has not increase any
>> ping-pong latency.
>> 
>> BW test:
>>  client: smc_run taskset -c 1 qperf smc-server -oo msg_size:1:64K:*2 \
>> 			-t 30 -vu tcp_bw
>>  server: smc_run taskset -c 1 qperf
>> 
>> MsgSize(Bytes)        TCP         SMC-NoCork           SMC-AutoCork
>>       1         2.57 MB/s     698 KB/s(-73.5%)     2.98 MB/s(16.0% )
>>       2          5.1 MB/s    1.41 MB/s(-72.4%)     5.82 MB/s(14.1% )
>>       4         10.2 MB/s    2.83 MB/s(-72.3%)     11.7 MB/s(14.7% )
>>       8         20.8 MB/s    5.62 MB/s(-73.0%)     22.9 MB/s(10.1% )
>>      16         42.5 MB/s    11.5 MB/s(-72.9%)     45.5 MB/s(7.1%  )
>>      32         80.7 MB/s    22.3 MB/s(-72.4%)     86.7 MB/s(7.4%  )
>>      64          155 MB/s    45.6 MB/s(-70.6%)      160 MB/s(3.2%  )
>>     128          295 MB/s    90.1 MB/s(-69.5%)      273 MB/s(-7.5% )
>>     256          539 MB/s     179 MB/s(-66.8%)      610 MB/s(13.2% )
>>     512          943 MB/s     360 MB/s(-61.8%)     1.02 GB/s(10.8% )
>>    1024         1.58 GB/s     710 MB/s(-56.1%)     1.91 GB/s(20.9% )
>>    2048         2.47 GB/s    1.34 GB/s(-45.7%)     2.92 GB/s(18.2% )
>>    4096         2.86 GB/s     2.5 GB/s(-12.6%)      2.4 GB/s(-16.1%)
>>    8192         3.89 GB/s    3.14 GB/s(-19.3%)     4.05 GB/s(4.1%  )
>>   16384         3.29 GB/s    4.67 GB/s(41.9% )     5.09 GB/s(54.7% )
>>   32768         2.73 GB/s    5.48 GB/s(100.7%)     5.49 GB/s(101.1%)
>>   65536            3 GB/s    4.85 GB/s(61.7% )     5.24 GB/s(74.7% )
>> 
>> Latency test:
>>  client: smc_run taskset -c 1 qperf smc-server -oo msg_size:1:64K:*2 \
>> 			-t 30 -vu tcp_lat
>>  server: smc_run taskset -c 1 qperf
>> 
>>  MsgSize              SMC-NoCork           SMC-AutoCork
>>        1               9.7 us               9.6 us( -1.03%)
>>        2              9.43 us              9.39 us( -0.42%)
>>        4               9.6 us              9.35 us( -2.60%)
>>        8              9.42 us               9.2 us( -2.34%)
>>       16              9.13 us              9.43 us(  3.29%)
>>       32              9.19 us               9.5 us(  3.37%)
>>       64              9.38 us               9.5 us(  1.28%)
>>      128               9.9 us              9.29 us( -6.16%)
>>      256              9.42 us              9.26 us( -1.70%)
>>      512                10 us              9.45 us( -5.50%)
>>     1024              10.4 us               9.6 us( -7.69%)
>>     2048              10.4 us              10.2 us( -1.92%)
>>     4096                11 us              10.5 us( -4.55%)
>>     8192              11.7 us              11.8 us(  0.85%)
>>    16384              14.5 us              14.2 us( -2.07%)
>>    32768              19.4 us              19.3 us( -0.52%)
>>    65536              28.1 us              28.8 us(  2.49%)
>
>This is quite an impressive improvement! Thanks for your effort!
>
>Could you share a bit more about how you performed these tests to give
>a bit more context and allow us to reproduce them on s390. I'm assuming
>the ConnectX-4 Lx card you're using is a 50 Gb/s model? Are you doing
>these tests on two bare metal hosts, one host with client/server
>namespaces, or between VMs? If it's namespaces or VMs are you using VFs
>from the same card/port or different cards. If it is two cards/ports do
>you have a switch or a cross cable between them?

Sure

I did the test in the VM environment. 2 VMs within a single physical host,
using 2 VFs from the same single ConnectX-4 Lx card, passthrough to each VM.
the card is dual-25Gbps so the internal chip should support 50Gbps.
A rough graph of the test setup is like this:

-------------------------------------
|  ---------           ---------    |
|  |       |           |       |    |
|  |  VM1  |           |  VM2  |    |
|  |       |           |       |    |
|  ---VF1---           ---VF2---    |
|      ^                   ^        |
|      |                   |        |
|      |----- CX-4 Lx -----|        |
|                             Host  |
|------------------------------------


      reply	other threads:[~2022-02-16 15:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 12:00 [PATCH net-next v2] net/smc: Add autocork support Dust Li
2022-02-16 15:20 ` Niklas Schnelle
2022-02-16 15:39   ` dust.li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220216153902.GC39286@linux.alibaba.com \
    --to=dust.li@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=schnelle@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.