From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH v1 00/22] client-side NFS/RDMA patches proposed for v4.9
Date: Mon, 15 Aug 2016 16:50:10 -0400 [thread overview]
Message-ID: <20160815195649.11652.32252.stgit@manet.1015granger.net> (raw)
Posted for review, the following patch series makes these changes:
- Correct use of DMA API
- Delay DMA mapping to permit device driver unload
- Introduce simple RDMA-CM private message exchange
- Support Remote Invalidation
- Support s/g list when sending RPC calls
Available in the "nfs-rdma-for-4.9" topic branch of this git repo:
git://git.linux-nfs.org/projects/cel/cel-2.6.git
Or for browsing:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.9
== Performance results ==
This is NFSv3 / RDMA, CX-3 Pro (FRWR) on a 12-core dual-socket
client and an 8-core single-socket server. The exported fs is a
tmpfs. Note that iozone reports latency for a system call, not
RPC round-trip.
Test #1: The inline threshold is set to 1KB, and Remote Invalidation
is disabled (RPC-over-RDMA Version One baseline).
O_DIRECT feature enabled
Microseconds/op Mode. Output is in microseconds per operation.
Command line used: /home/cel/bin/iozone -i0 -i1 -s128m -y1k -az -I -N
KB reclen write rewrite read reread
131072 1 61 62 51 51
131072 2 63 62 51 51
131072 4 64 63 52 51
131072 8 67 66 54 52
131072 16 71 70 56 56
131072 32 83 80 63 63
131072 64 104 100 83 82
O_DIRECT feature enabled
OPS Mode. Output is in operations per second.
Command line used: /home/cel/bin/iozone -i0 -i1 -s16m -r4k -t12 -I -O
Throughput test with 12 processes
Each process writes a 16384 Kbyte file in 4 Kbyte records
Children see throughput for 12 readers = 84198.24 ops/sec
Parent sees throughput for 12 readers = 84065.36 ops/sec
Min throughput per process = 5925.38 ops/sec
Max throughput per process = 7346.19 ops/sec
Avg throughput per process = 7016.52 ops/sec
Min xfer = 3300.00 ops
Test #2: The inline threshold is set to 4KB, and Remote Invalidation
is enabled. This means I/O payloads smaller than about 3.9KB do not
use explicit RDMA at all, and no LOCAL_INV WR is needed for operations
that do use RDMA.
O_DIRECT feature enabled
Microseconds/op Mode. Output is in microseconds per operation.
Command line used: /home/cel/bin/iozone -i0 -i1 -s128m -y1k -az -I -N
KB reclen write rewrite read reread
131072 1 41 43 37 37
131072 2 44 44 37 37
131072 4 61 59 41 41
131072 8 63 62 43 43
131072 16 68 66 47 47
131072 32 76 72 53 53
131072 64 100 95 70 70
O_DIRECT feature enabled
OPS Mode. Output is in operations per second.
Command line used: /home/cel/bin/iozone -i0 -i1 -s16m -r4k -t12 -I -O
Throughput test with 12 processes
Each process writes a 16384 Kbyte file in 4 Kbyte records
Children see throughput for 12 readers = 111520.52 ops/sec
Parent sees throughput for 12 readers = 111250.80 ops/sec
Min throughput per process = 8463.72 ops/sec
Max throughput per process = 9658.81 ops/sec
Avg throughput per process = 9293.38 ops/sec
Min xfer = 3596.00 ops
== Analysis ==
To understand these results, note that:
Typical round-trip latency in this configuration for LOOKUP, ACCESS
and GETATTR (which bear no data payload) is 30-35us.
- An NFS READ call is a pure inline RDMA Send
- A small NFS READ reply is a pure inline RDMA Send
- A large NFS READ reply is an RDMA Write followed by an RDMA Send
- A small NFS WRITE call is a pure inline RDMA Send
- A large NFS WRITE call is an RDMA Send followed by the
server doing an RDMA Read
- An NFS WRITE reply is a pure inline RDMA Send
In Test #2, the 1KB and 2KB I/Os are all pure inline. No explicit
RDMA operation is involved. At 4KB and above, explicit RDMA is used
with a single STag. The server invalidates each RPC's STag, so no
LOCAL_INV WR is needed on the client for Test #2.
The key take-aways are that:
- For small payloads, NFS READ using RDMA Write with Remote
Invalidation is nearly as fast as pure inline; both modes take
about 40usec per RPC
- The NFS READ improvement with Remote Invalidation enabled is
effective even at 8KB payloads and above, but 10us is relatively
small compared to other transmission costs
- For small payloads, the RDMA Read round-trip still adds
significant per-WRITE latency
---
Chuck Lever (22):
xprtrdma: Eliminate INLINE_THRESHOLD macros
SUNRPC: Refactor rpc_xdr_buf_init()
SUNRPC: Generalize the RPC buffer allocation API
SUNRPC: Generalize the RPC buffer release API
SUNRPC: Separate buffer pointers for RPC Call and Reply messages
SUNRPC: Add a transport-specific private field in rpc_rqst
xprtrdma: Initialize separate RPC call and reply buffers
xprtrdma: Use smaller buffers for RPC-over-RDMA headers
xprtrdma: Replace DMA_BIDIRECTIONAL
xprtrdma: Delay DMA mapping Send and Receive buffers
xprtrdma: Eliminate "ia" argument in rpcrdma_{alloc,free}_regbuf
xprtrdma: Simplify rpcrdma_ep_post_recv()
xprtrdma: Move send_wr to struct rpcrdma_req
xprtrdma: Move recv_wr to struct rpcrdma_rep
xprtrmda: Report address of frmr, not mw
rpcrdma: RDMA/CM private message data structure
xprtrdma: Client-side support for rpcrdma_connect_private
xprtrdma: Basic support for Remote Invalidation
xprtrdma: Use gathered Send for large inline messages
xprtrdma: Support larger inline thresholds
xprtrdma: Rename rpcrdma_receive_wc()
xprtrdma: Eliminate rpcrdma_receive_worker()
include/linux/sunrpc/rpc_rdma.h | 39 ++++
include/linux/sunrpc/sched.h | 4
include/linux/sunrpc/xdr.h | 10 +
include/linux/sunrpc/xprt.h | 12 +
include/linux/sunrpc/xprtrdma.h | 4
net/sunrpc/backchannel_rqst.c | 8 -
net/sunrpc/clnt.c | 36 +--
net/sunrpc/sched.c | 36 ++-
net/sunrpc/sunrpc.h | 2
net/sunrpc/xprt.c | 2
net/sunrpc/xprtrdma/backchannel.c | 48 ++--
net/sunrpc/xprtrdma/fmr_ops.c | 7 -
net/sunrpc/xprtrdma/frwr_ops.c | 27 ++-
net/sunrpc/xprtrdma/rpc_rdma.c | 299 ++++++++++++++++++++--------
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 19 +-
net/sunrpc/xprtrdma/transport.c | 201 +++++++++++--------
net/sunrpc/xprtrdma/verbs.c | 238 +++++++++++++---------
net/sunrpc/xprtrdma/xprt_rdma.h | 102 ++++++----
net/sunrpc/xprtsock.c | 23 +-
19 files changed, 700 insertions(+), 417 deletions(-)
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2016-08-15 20:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-15 20:50 Chuck Lever [this message]
[not found] ` <20160815195649.11652.32252.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-08-15 20:50 ` [PATCH v1 01/22] xprtrdma: Eliminate INLINE_THRESHOLD macros Chuck Lever
2016-08-15 20:50 ` [PATCH v1 02/22] SUNRPC: Refactor rpc_xdr_buf_init() Chuck Lever
[not found] ` <20160815205030.11652.6620.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-08-15 21:51 ` Trond Myklebust
[not found] ` <0750E5E3-335B-4D49-8B17-45DC0A616225-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2016-08-16 1:24 ` Chuck Lever
2016-08-15 20:50 ` [PATCH v1 03/22] SUNRPC: Generalize the RPC buffer allocation API Chuck Lever
[not found] ` <20160815205038.11652.47749.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-08-15 21:59 ` Trond Myklebust
[not found] ` <CD81109C-4EB7-4925-9D30-852D3440ACF3-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2016-08-16 1:20 ` Chuck Lever
2016-08-15 20:50 ` [PATCH v1 04/22] SUNRPC: Generalize the RPC buffer release API Chuck Lever
2016-08-15 20:50 ` [PATCH v1 05/22] SUNRPC: Separate buffer pointers for RPC Call and Reply messages Chuck Lever
2016-08-15 20:51 ` [PATCH v1 06/22] SUNRPC: Add a transport-specific private field in rpc_rqst Chuck Lever
2016-08-15 20:51 ` [PATCH v1 07/22] xprtrdma: Initialize separate RPC call and reply buffers Chuck Lever
2016-08-15 20:51 ` [PATCH v1 08/22] xprtrdma: Use smaller buffers for RPC-over-RDMA headers Chuck Lever
2016-08-15 20:51 ` [PATCH v1 09/22] xprtrdma: Replace DMA_BIDIRECTIONAL Chuck Lever
2016-08-15 20:51 ` [PATCH v1 10/22] xprtrdma: Delay DMA mapping Send and Receive buffers Chuck Lever
2016-08-15 20:51 ` [PATCH v1 11/22] xprtrdma: Eliminate "ia" argument in rpcrdma_{alloc, free}_regbuf Chuck Lever
2016-08-15 20:51 ` [PATCH v1 12/22] xprtrdma: Simplify rpcrdma_ep_post_recv() Chuck Lever
2016-08-15 20:52 ` [PATCH v1 13/22] xprtrdma: Move send_wr to struct rpcrdma_req Chuck Lever
2016-08-15 20:52 ` [PATCH v1 14/22] xprtrdma: Move recv_wr to struct rpcrdma_rep Chuck Lever
2016-08-15 20:52 ` [PATCH v1 15/22] xprtrmda: Report address of frmr, not mw Chuck Lever
2016-08-15 20:52 ` [PATCH v1 16/22] rpcrdma: RDMA/CM private message data structure Chuck Lever
2016-08-15 20:52 ` [PATCH v1 17/22] xprtrdma: Client-side support for rpcrdma_connect_private Chuck Lever
2016-08-15 20:52 ` [PATCH v1 18/22] xprtrdma: Basic support for Remote Invalidation Chuck Lever
2016-08-15 20:52 ` [PATCH v1 19/22] xprtrdma: Use gathered Send for large inline messages Chuck Lever
[not found] ` <20160815205249.11652.23617.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-08-15 21:46 ` kbuild test robot
2016-08-15 20:52 ` [PATCH v1 20/22] xprtrdma: Support larger inline thresholds Chuck Lever
2016-08-15 20:53 ` [PATCH v1 21/22] xprtrdma: Rename rpcrdma_receive_wc() Chuck Lever
2016-08-15 20:53 ` [PATCH v1 22/22] xprtrdma: Eliminate rpcrdma_receive_worker() Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160815195649.11652.32252.stgit@manet.1015granger.net \
--to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox