From: Wen Gu <guwen@linux.alibaba.com>
To: wintera@linux.ibm.com, wenjia@linux.ibm.com, hca@linux.ibm.com,
gor@linux.ibm.com, agordeev@linux.ibm.com, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
kgraul@linux.ibm.com, jaka@linux.ibm.com
Cc: borntraeger@linux.ibm.com, svens@linux.ibm.com,
alibuda@linux.alibaba.com, tonylu@linux.alibaba.com,
guwen@linux.alibaba.com, linux-s390@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH net-next 00/13] net/smc: implement loopback-ism used by SMC-D
Date: Sun, 10 Dec 2023 21:24:01 +0800 [thread overview]
Message-ID: <1702214654-32069-1-git-send-email-guwen@linux.alibaba.com> (raw)
(Note that this patch set depends on virtual ISM support, which is under review:
https://lore.kernel.org/netdev/1702021259-41504-1-git-send-email-guwen@linux.alibaba.com/)
This patch set acts as the second part of the new version of [1], the updated
things of this version are listed at the end.
# Background
SMC-D is now used in IBM z with ISM function to optimize network interconnect
for intra-CPC communications. Inspired by this, we try to make SMC-D available
on the non-s390 architecture through a software-implemented virtual ISM device,
that is the loopback-ism device here, to accelerate inter-process or
inter-containers communication within the same OS instance.
# Design
This patch set includes 3 parts:
- Patch #1-#2: some prepare work for loopback-ism.
- Patch #3-#9: implement loopback-ism device.
- Patch #10-#13: memory copy optimization for loopback scenario.
The loopback-ism device is designed as a ISMv2 device and not be limited to
a specific net namespace, ends of both inter-process connection (1/1' in diagram
below) or inter-container connection (2/2' in diagram below) can find the same
available loopback-ism and choose it during the CLC handshake.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ +-------+ +-------+ | | +-------+ |
| | App A | | App B | | App C | | | | App D |<-+ |
| +-------+ +---^---+ +-------+ | | +-------+ |(2') |
| |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| |
| (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ |
| `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | |
+---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+
| | | |
Kernel | | | |
+----+-------v---+-----------v----------------------------------+---+----+
| | TCP | |
| | | |
| +--------------------------------------------------------------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
loopback-ism device creates DMBs (shared memory) for each connection peer.
Since data transfer occurs within the same kernel, the sndbuf of each peer
is only a descriptor and point to the same memory region as peer DMB, so that
the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
Container 1 (ns1) Container 2 (ns2)
+-----------------------------------------+ +-------------------------+
| +-------+ | | +-------+ |
| | App C |-----+ | | | App D | |
| +-------+ | | | +-^-----+ |
| | | | | |
| (2) | | | (2') | |
| | | | | |
+---------------|-------------------------+ +----------|--------------+
| |
Kernel | |
+---------------|-----------------------------------------|--------------+
| +--------+ +--v-----+ +--------+ +--------+ |
| |dmb_desc| |snd_desc| |dmb_desc| |snd_desc| |
| +-----|--+ +--|-----+ +-----|--+ +--------+ |
| +-----|--+ | +-----|--+ |
| | DMB C | +---------------------------------| DMB D | |
| +--------+ +--------+ |
| |
| +--------------+ |
| | smc loopback | |
+---------------------------+--------------+-----------------------------+
# Benchmark Test
* Test environments:
- VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
- SMC sndbuf/DMB size 1MB.
* Test object:
- TCP: run on TCP loopback.
- domain: run on UNIX domain.
- SMC lo: run on SMC loopback device.
1. ipc-benchmark (see [2])
- ./<foo> -c 1000000 -s 100
TCP SMC-lo
Message
rate (msg/s) 81539 151251(+85.50%)
2. sockperf
- serv: <smc_run> taskset -c <cpu> sockperf sr --tcp
- clnt: <smc_run> taskset -c <cpu> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
TCP SMC-lo
Bandwidth(MBps) 5313.66 8270.51(+55.65%)
Latency(us) 5.806 3.207(-44.76%)
3. nginx/wrk
- serv: <smc_run> nginx
- clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
TCP SMC-lo
Requests/s 194641.79 258656.13(+32.89%)
4. redis-benchmark
- serv: <smc_run> redis-server
- clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
TCP SMC-lo
GET(Requests/s) 85855.34 115640.35(+34.69%)
SET(Requests/s) 86337.15 118203.30(+36.90%)
[1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
[2] https://github.com/goldsborough/ipc-bench
Updated in this version compare to [1]:
- Patch #1: improve the loopback-ism dump, it shows as follows now:
# smcd d
FID Type PCI-ID PCHID InUse #LGs PNET-ID
0000 0 loopback-ism ffff No 0
- Patch #3: introduce the smc_ism_set_v2_capable() helper and set
smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
regardless of whether there is already a device in smcd device list.
- Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
- Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
to activate or deactivate the loopback-ism.
- Patch #9: introduce the statistics of loopback-ism by
/sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
- Some minor changes and comments improvements.
Wen Gu (13):
net/smc: improve SMC-D device dump for virtual ISM
net/smc: decouple specialized struct from SMC-D DMB registration
net/smc: introduce virtual ISM device loopback-ism
net/smc: implement ID-related operations of loopback-ism
net/smc: implement some unsupported operations of loopback-ism
net/smc: implement DMB-related operations of loopback-ism
net/smc: register loopback-ism into SMC-D device list
net/smc: introduce loopback-ism runtime switch
net/smc: introduce loopback-ism statistics attributes
net/smc: introduce operations to {at|de}tach ghost sndbuf to peer DMB
net/smc: attach or detach ghost sndbuf to peer DMB.
net/smc: adapt cursor update when sndbuf is mapped to peer DMB
net/smc: implement {at|de}tach_dmb interfaces of loopback-ism
drivers/s390/net/ism_drv.c | 2 +-
include/net/smc.h | 6 +-
net/smc/Kconfig | 13 +
net/smc/Makefile | 2 +-
net/smc/af_smc.c | 33 ++-
net/smc/smc_cdc.c | 58 ++++-
net/smc/smc_cdc.h | 1 +
net/smc/smc_core.c | 71 +++++-
net/smc/smc_core.h | 1 +
net/smc/smc_ism.c | 69 +++++-
net/smc/smc_ism.h | 5 +
net/smc/smc_loopback.c | 603 +++++++++++++++++++++++++++++++++++++++++++++
net/smc/smc_loopback.h | 80 ++++++
13 files changed, 915 insertions(+), 29 deletions(-)
create mode 100644 net/smc/smc_loopback.c
create mode 100644 net/smc/smc_loopback.h
--
1.8.3.1
next reply other threads:[~2023-12-10 13:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-10 13:24 Wen Gu [this message]
2023-12-10 13:24 ` [RFC PATCH net-next 01/13] net/smc: improve SMC-D device dump for virtual ISM Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 02/13] net/smc: decouple specialized struct from SMC-D DMB registration Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 03/13] net/smc: introduce virtual ISM device loopback-ism Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 04/13] net/smc: implement ID-related operations of loopback-ism Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 05/13] net/smc: implement some unsupported " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 06/13] net/smc: implement DMB-related " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 07/13] net/smc: register loopback-ism into SMC-D device list Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 08/13] net/smc: introduce loopback-ism runtime switch Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 09/13] net/smc: introduce loopback-ism statistics attributes Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 10/13] net/smc: introduce operations to {at|de}tach ghost sndbuf to peer DMB Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 11/13] net/smc: attach or detach " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 12/13] net/smc: adapt cursor update when sndbuf is mapped " Wen Gu
2023-12-10 13:24 ` [RFC PATCH net-next 13/13] net/smc: implement {at|de}tach_dmb interfaces of loopback-ism Wen Gu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1702214654-32069-1-git-send-email-guwen@linux.alibaba.com \
--to=guwen@linux.alibaba.com \
--cc=agordeev@linux.ibm.com \
--cc=alibuda@linux.alibaba.com \
--cc=borntraeger@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jaka@linux.ibm.com \
--cc=kgraul@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=svens@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
--cc=wintera@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.