From: Wen Gu <guwen@linux.alibaba.com>
To: wintera@linux.ibm.com, wenjia@linux.ibm.com, hca@linux.ibm.com,
gor@linux.ibm.com, agordeev@linux.ibm.com, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
kgraul@linux.ibm.com, jaka@linux.ibm.com
Cc: borntraeger@linux.ibm.com, svens@linux.ibm.com,
alibuda@linux.alibaba.com, tonylu@linux.alibaba.com,
guwen@linux.alibaba.com, linux-s390@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH net-next 00/13] net/smc: implement loopback-ism used by SMC-D
Date: Sun, 10 Dec 2023 21:24:01 +0800 [thread overview]
Message-ID: <1702214654-32069-1-git-send-email-guwen@linux.alibaba.com> (raw)
(Note that this patch set depends on virtual ISM support, which is under review:
https://lore.kernel.org/netdev/1702021259-41504-1-git-send-email-guwen@linux.alibaba.com/)
This patch set acts as the second part of the new version of [1], the updated
things of this version are listed at the end.
# Background
SMC-D is now used in IBM z with ISM function to optimize network interconnect
for intra-CPC communications. Inspired by this, we try to make SMC-D available
on the non-s390 architecture through a software-implemented virtual ISM device,
that is the loopback-ism device here, to accelerate inter-process or
inter-containers communication within the same OS instance.
# Design
This patch set includes 3 parts:
- Patch #1-#2: some prepare work for loopback-ism.
- Patch #3-#9: implement loopback-ism device.
- Patch #10-#13: memory copy optimization for loopback scenario.
The loopback-ism device is designed as a ISMv2 device and not be limited to
a specific net namespace, ends of both inter-process connection (1/1' in diagram
below) or inter-container connection (2/2' in diagram below) can find the same
available loopback-ism and choose it during the CLC handshake.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ +-------+ +-------+ | | +-------+ |
| | App A | | App B | | App C | | | | App D |<-+ |
| +-------+ +---^---+ +-------+ | | +-------+ |(2') |
| |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| |
| (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ |
| `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | |
+---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+
| | | |
Kernel | | | |
+----+-------v---+-----------v----------------------------------+---+----+
| | TCP | |
| | | |
| +--------------------------------------------------------------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
loopback-ism device creates DMBs (shared memory) for each connection peer.
Since data transfer occurs within the same kernel, the sndbuf of each peer
is only a descriptor and point to the same memory region as peer DMB, so that
the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ | | +-------+ |
| | App C |-----+ | | | App D | |
| +-------+ | | | +-^-----+ |
| | | | | |
| (2) | | | (2') | |
| | | | | |
+---------------|-------------------------+ +----------|--------------+
| |
Kernel | |
+---------------|-----------------------------------------|--------------+
| +--------+ +--v-----+ +--------+ +--------+ |
| |dmb_desc| |snd_desc| |dmb_desc| |snd_desc| |
| +-----|--+ +--|-----+ +-----|--+ +--------+ |
| +-----|--+ | +-----|--+ |
| | DMB C | +---------------------------------| DMB D | |
| +--------+ +--------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
# Benchmark Test
* Test environments:
- VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
- SMC sndbuf/DMB size 1MB.
* Test object:
- TCP: run on TCP loopback.
- domain: run on UNIX domain.
- SMC lo: run on SMC loopback device.
1. ipc-benchmark (see [2])
- ./<foo> -c 1000000 -s 100
TCP SMC-lo
Message
rate (msg/s) 81539 151251(+85.50%)
2. sockperf
- serv: <smc_run> taskset -c <cpu> sockperf sr --tcp
- clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
TCP SMC-lo
Bandwidth(MBps) 5313.66 8270.51(+55.65%)
Latency(us) 5.806 3.207(-44.76%)
3. nginx/wrk
- serv: <smc_run> nginx
- clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
TCP SMC-lo
Requests/s 194641.79 258656.13(+32.89%)
4. redis-benchmark
- serv: <smc_run> redis-server
- clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
TCP SMC-lo
GET(Requests/s) 85855.34 115640.35(+34.69%)
SET(Requests/s) 86337.15 118203.30(+36.90%)
[1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
[2] https://github.com/goldsborough/ipc-bench
Updated in this version compare to [1]:
- Patch #1: improve the loopback-ism dump, it shows as follows now:
# smcd d
FID Type PCI-ID PCHID InUse #LGs PNET-ID
0000 0 loopback-ism ffff No 0
- Patch #3: introduce the smc_ism_set_v2_capable() helper and set
smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
regardless of whether there is already a device in smcd device list.
- Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
- Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
to activate or deactivate the loopback-ism.
- Patch #9: introduce the statistics of loopback-ism by
/sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
- Some minor changes and comments improvements.
Wen Gu (13):
net/smc: improve SMC-D device dump for virtual ISM
net/smc: decouple specialized struct from SMC-D DMB registration
net/smc: introduce virtual ISM device loopback-ism
net/smc: implement ID-related operations of loopback-ism
net/smc: implement some unsupported operations of loopback-ism
net/smc: implement DMB-related operations of loopback-ism
net/smc: register loopback-ism into SMC-D device list
net/smc: introduce loopback-ism runtime switch
net/smc: introduce loopback-ism statistics attributes
net/smc: introduce operations to {at|de}tach ghost sndbuf to peer DMB
net/smc: attach or detach ghost sndbuf to peer DMB.
net/smc: adapt cursor update when sndbuf is mapped to peer DMB
net/smc: implement {at|de}tach_dmb interfaces of loopback-ism
drivers/s390/net/ism_drv.c | 2 +-
include/net/smc.h | 6 +-
net/smc/Kconfig | 13 +
net/smc/Makefile | 2 +-
net/smc/af_smc.c | 33 ++-
net/smc/smc_cdc.c | 58 ++++-
net/smc/smc_cdc.h | 1 +
net/smc/smc_core.c | 71 +++++-
net/smc/smc_core.h | 1 +
net/smc/smc_ism.c | 69 +++++-
net/smc/smc_ism.h | 5 +
net/smc/smc_loopback.c | 603 +++++++++++++++++++++++++++++++++++++++++++++
net/smc/smc_loopback.h | 80 ++++++
13 files changed, 915 insertions(+), 29 deletions(-)
create mode 100644 net/smc/smc_loopback.c
create mode 100644 net/smc/smc_loopback.h
--
1.8.3.1
next reply other threads:[~2023-12-10 13:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-10 13:24 Wen Gu [this message]
2023-12-10 13:24 ` [RFC PATCH net-next 01/13] net/smc: improve SMC-D device dump for virtual ISM Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 02/13] net/smc: decouple specialized struct from SMC-D DMB registration Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 03/13] net/smc: introduce virtual ISM device loopback-ism Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 04/13] net/smc: implement ID-related operations of loopback-ism Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 05/13] net/smc: implement some unsupported " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 06/13] net/smc: implement DMB-related " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 07/13] net/smc: register loopback-ism into SMC-D device list Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 08/13] net/smc: introduce loopback-ism runtime switch Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 09/13] net/smc: introduce loopback-ism statistics attributes Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 10/13] net/smc: introduce operations to {at|de}tach ghost sndbuf to peer DMB Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 11/13] net/smc: attach or detach " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 12/13] net/smc: adapt cursor update when sndbuf is mapped " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 13/13] net/smc: implement {at|de}tach_dmb interfaces of loopback-ism Wen Gu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1702214654-32069-1-git-send-email-guwen@linux.alibaba.com \
--to=guwen@linux.alibaba.com \
--cc=agordeev@linux.ibm.com \
--cc=alibuda@linux.alibaba.com \
--cc=borntraeger@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jaka@linux.ibm.com \
--cc=kgraul@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=svens@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
--cc=wintera@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).