From: Mike Snitzer <snitzer@kernel.org>
To: linux-nfs@vger.kernel.org
Cc: Jeff Layton <jlayton@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
NeilBrown <neilb@suse.de>,
linux-fsdevel@vger.kernel.org
Subject: [PATCH v12 19/24] nfs: add Documentation/filesystems/nfs/localio.rst
Date: Mon, 19 Aug 2024 14:17:24 -0400 [thread overview]
Message-ID: <20240819181750.70570-20-snitzer@kernel.org> (raw)
In-Reply-To: <20240819181750.70570-1-snitzer@kernel.org>
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server to allow them to reliably
handshake to determine if they are on the same host.
Once an NFS client and server handshake as "local", the client will
bypass the network RPC protocol for read, write and commit operations.
Due to this XDR and RPC bypass, these operations will operate faster.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
Documentation/filesystems/nfs/localio.rst | 178 ++++++++++++++++++++++
include/linux/nfslocalio.h | 2 +
2 files changed, 180 insertions(+)
create mode 100644 Documentation/filesystems/nfs/localio.rst
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..d8bdab88f1db
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,178 @@
+===========
+NFS LOCALIO
+===========
+
+Overview
+========
+
+The LOCALIO auxiliary RPC protocol allows the Linux NFS client and
+server to reliably handshake to determine if they are on the same host.
+
+Once an NFS client and server handshake as "local", the client will
+bypass the network RPC protocol for read, write and commit operations.
+Due to this XDR and RPC bypass, these operations will operate faster.
+
+The LOCALIO auxiliary protocol's implementation, which uses the same
+connection as NFS traffic, follows the pattern established by the NFS
+ACL protocol extension.
+
+The LOCALIO auxiliary protocol is needed to allow robust discovery of
+clients local to their servers. In a private implementation that
+preceded use of this LOCALIO protocol, a fragile sockaddr network
+address based match against all local network interfaces was attempted.
+But unlike the LOCALIO protocol, the sockaddr-based matching didn't
+handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate use case this locality makes possible is the
+client is able to open files and issue reads, writes and commits
+directly to the server without having to go over the network. The
+requirement is to perform these loopback NFS operations as efficiently
+as possible, this is particularly useful for container use cases
+(e.g. kubernetes) where it is possible to run an IO job local to the
+server.
+
+The performance advantage realized from LOCALIO's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 128k directio reads, qd of 8,
+- With LOCALIO:
+ read: IOPS=311k, BW=38.0GiB/s (40.8GB/s)(760GiB/20001msec)
+- Without LOCALIO:
+ read: IOPS=12.0k, BW=1495MiB/s (1568MB/s)(29.2GiB/20015msec)
+
+RPC
+===
+
+The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
+RPC method that allows the Linux NFS client to verify the local Linux
+NFS server can see the nonce (single-use UUID) the client generated and
+made available in nfs_common. This protocol isn't part of an IETF
+standard, nor does it need to be considering it is Linux-to-Linux
+auxiliary RPC protocol that amounts to an implementation detail.
+
+The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
+the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
+XDR methods are used instead of the less efficient variable sized
+methods.
+
+The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
+by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
+Linux Kernel Organization 400122 nfslocalio
+
+The LOCALIO protocol spec in rpcgen syntax is:
+
+/* raw RFC 9562 UUID */
+#define UUID_SIZE 16
+typedef u8 uuid_t<UUID_SIZE>;
+
+program NFS_LOCALIO_PROGRAM {
+ version LOCALIO_V1 {
+ void
+ NULL(void) = 0;
+
+ void
+ UUID_IS_LOCAL(uuid_t) = 1;
+ } = 1;
+} = 400122;
+
+LOCALIO uses the same transport connection as NFS traffic. As such,
+LOCALIO is not registered with rpcbind.
+
+NFS Common and Client/Server Handshake
+======================================
+
+fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client
+to generate a nonce (single-use UUID) and associated short-lived
+nfs_uuid_t struct, register it with nfs_common for subsequent lookup and
+verification by the NFS server and if matched the NFS server populates
+members in the nfs_uuid_t struct. The nfs client then transfers these
+nfs_uuid_t struct member pointers to the nfs_client struct and cleans up
+the nfs_uuid_t struct. See: fs/nfs/localio.c:nfs_local_probe()
+
+nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such
+it has members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client issues IO instead of Server
+======================================
+
+Because LOCALIO is focused on protocol bypass to achieve improved IO
+performance alternatives to traditional NFS wire protocol (SUNRPC with
+XDR) to access the backing filesystem must be provided.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of select nfs server objects to allow a client local to a
+server to open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a LOCALIO client is connected to it.
+
+Once the client has an open file pointer it will issue reads, writes and
+commits directly to the underlying local filesystem (normally done by
+the nfs server). As such, for these operations, the NFS client is
+issuing IO to the underlying local filesystem that it is sharing with
+the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
+fs/nfs/localio.c:nfs_local_commit().
+
+Security
+========
+
+Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
+AUTH_SYS) is used.
+
+Care is taken to ensure the same NFS security mechanisms are used
+(authentication, etc) regardless of whether LOCALIO or regular NFS
+access is used. The auth_domain established as part of the traditional
+NFS client access to the NFS server is also used for LOCALIO.
+
+Relative to containers, LOCALIO gives the client access to the network
+namespace the server has. This is required to allow the client to access
+the server's per-namespace nfsd_net struct. With traditional NFS, the
+client is afforded this same level of access (albeit in terms of the NFS
+protocol via SUNRPC). No other namespaces (user, mount, etc) have been
+altered or purposely extended from the server to the client.
+
+Testing
+=======
+
+The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write
+and commit access have proven stable against various test scenarios:
+
+- Client and server both on the same host.
+
+- All permutations of client and server support enablement for both
+ local and remote client and server.
+
+- Testing against NFS storage products that don't support the LOCALIO
+ protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2).
+ The container testing was in terms of podman managed containers and
+ includes successful container stop/restart scenario.
+
+- Formalizing these test scenarios in terms of existing test
+ infrastructure is on-going. Initial regular coverage is provided in
+ terms of ktest running xfstests against a LOCALIO-enabled NFS loopback
+ mount configuration, and includes lockdep and KASAN coverage, see:
+ https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next
+ https://github.com/koverstreet/ktest
+
+- Various kdevops testing (in terms of "Chuck's BuildBot") has been
+ performed to regularly verify the LOCALIO changes haven't caused any
+ regressions to non-LOCALIO NFS use cases.
+
+- All of Hammerspace's various sanity tests pass with LOCALIO enabled
+ (this includes numerous pNFS and flexfiles tests).
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 109cb8534e3f..c6edcb7c0dcd 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -15,6 +15,8 @@
/*
* Useful to allow a client to negotiate if localio
* possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
*/
typedef struct {
uuid_t uuid;
--
2.44.0
next prev parent reply other threads:[~2024-08-19 18:18 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 18:17 [PATCH v12 00/24] nfs/nfsd: add support for localio Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 01/24] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 02/24] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 03/24] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 04/24] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 05/24] nfsd: fix nfsfh tracepoints to properly handle NULL rqstp Mike Snitzer
2024-08-21 17:46 ` Jeff Layton
2024-08-21 21:23 ` Mike Snitzer
2024-08-22 15:07 ` Chuck Lever
2024-08-22 16:04 ` Mike Snitzer
2024-08-22 17:07 ` Jeff Layton
2024-08-22 17:20 ` Mike Snitzer
2024-08-22 18:14 ` Chuck Lever III
2024-08-19 18:17 ` [PATCH v12 06/24] nfsd: add nfsd_file_acquire_local() Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 07/24] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 08/24] SUNRPC: add rpcauth_map_clnt_to_svc_cred_local Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 09/24] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-08-21 18:04 ` Jeff Layton
2024-08-21 18:39 ` Jeff Layton
2024-08-19 18:17 ` [PATCH v12 10/24] nfsd: add localio support Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 11/24] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 12/24] SUNRPC: replace program list with program array Mike Snitzer
2024-08-21 18:31 ` Jeff Layton
2024-08-21 20:40 ` Mike Snitzer
2024-08-21 21:43 ` Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 13/24] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 14/24] nfs: add localio support Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 15/24] nfs: enable localio for non-pNFS IO Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 16/24] pnfs/flexfiles: enable localio support Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 17/24] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 18/24] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-19 18:17 ` Mike Snitzer [this message]
2024-08-19 18:17 ` [PATCH v12 20/24] nfsd: use GC for nfsd_file returned by nfsd_file_acquire_local Mike Snitzer
2024-08-21 18:34 ` Jeff Layton
2024-08-19 18:17 ` [PATCH v12 21/24] nfs_common: expose localio's required nfsd symbols to nfs client Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 22/24] nfs: push localio nfsd_file_put call out to client Mike Snitzer
2024-08-21 18:50 ` Jeff Layton
2024-08-19 18:17 ` [PATCH v12 23/24] nfs: switch client to use nfsd_file for localio Mike Snitzer
2024-08-19 18:17 ` [PATCH v12 24/24] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-21 19:03 ` Jeff Layton
2024-08-21 20:12 ` Mike Snitzer
2024-08-21 20:14 ` Mike Snitzer
2024-08-21 23:46 ` Jeff Layton
2024-08-19 18:29 ` [PATCH v12 00/24] nfs/nfsd: add support for localio Chuck Lever III
2024-08-19 18:43 ` Mike Snitzer
2024-08-21 19:20 ` Jeff Layton
2024-08-21 20:05 ` Mike Snitzer
2024-08-22 12:35 ` Jeff Layton
2024-08-22 2:00 ` Mike Snitzer
2024-08-22 12:50 ` Jeff Layton
2024-08-22 15:18 ` Chuck Lever III
2024-08-22 15:42 ` Mike Snitzer
2024-08-21 19:56 ` Chuck Lever
2024-08-21 20:10 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240819181750.70570-20-snitzer@kernel.org \
--to=snitzer@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).