From: Mike Snitzer <snitzer@kernel.org>
To: linux-nfs@vger.kernel.org
Cc: Jeff Layton <jlayton@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
NeilBrown <neilb@suse.de>,
snitzer@hammerspace.com
Subject: [PATCH v11 20/20] nfs: add Documentation/filesystems/nfs/localio.rst
Date: Tue, 2 Jul 2024 12:28:31 -0400 [thread overview]
Message-ID: <20240702162831.91604-21-snitzer@kernel.org> (raw)
In-Reply-To: <20240702162831.91604-1-snitzer@kernel.org>
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server (both v3 and v4) to allow a
client and server to reliably handshake to determine if they are on the
same host. The LOCALIO auxiliary protocol's implementation, which uses
the same connection as NFS traffic, follows the pattern established by
the NFS ACL protocol extension.
The robust handshake between local client and server is just the
beginning, the ultimate usecase this locality makes possible is the
client is able to issue reads, writes and commits directly to the server
without having to go over the network. This is particularly useful for
container usecases (e.g. kubernetes) where it is possible to run an IO
job local to the server.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++++++++++++
include/linux/nfslocalio.h | 2 +
2 files changed, 137 insertions(+)
create mode 100644 Documentation/filesystems/nfs/localio.rst
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..7f211e3fc34c
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,135 @@
+===========
+NFS localio
+===========
+
+This document gives an overview of the LOCALIO auxiliary RPC protocol
+added to the Linux NFS client and server (both v3 and v4) to allow a
+client and server to reliably handshake to determine if they are on the
+same host. The LOCALIO auxiliary protocol's implementation, which uses
+the same connection as NFS traffic, follows the pattern established by
+the NFS ACL protocol extension.
+
+The LOCALIO auxiliary protocol is needed to allow robust discovery of
+clients local to their servers. In a private implementation that
+preceded use of this LOCALIO protocol, a fragile sockaddr network
+address based match against all local network interfaces was attempted.
+But unlike the LOCALIO protocol, the sockaddr-based matching didn't
+handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate usecase this locality makes possible is the
+client is able to issue reads, writes and commits directly to the server
+without having to go over the network. This is particularly useful for
+container usecases (e.g. kubernetes) where it is possible to run an IO
+job local to the server.
+
+The performance advantage realized from localio's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
+- With localio:
+ read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
+- Without localio:
+ read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
+
+RPC
+---
+
+The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
+method that allows the Linux NFS client to retrieve a Linux NFS server's
+uuid. This protocol isn't part of an IETF standard, nor does it need to
+be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
+to an implementation detail.
+
+The GETUUID method encodes the server's uuid_t in terms of the fixed
+UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR
+methods are used instead of the less efficient variable sized methods.
+
+The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
+by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
+Linux Kernel Organization 400122 nfslocalio
+
+The LOCALIO protocol spec in rpcgen syntax is:
+
+/* raw RFC 9562 UUID */
+#define UUID_SIZE 16
+typedef u8 uuid_t<UUID_SIZE>;
+
+program NFS_LOCALIO_PROGRAM {
+ version LOCALIO_V1 {
+ void
+ NULL(void) = 0;
+
+ uuid_t
+ GETUUID(void) = 1;
+ } = 1;
+} = 400122;
+
+LOCALIO uses the same transport connection as NFS traffic. As such,
+LOCALIO is not registered with rpcbind.
+
+Once an NFS client and server handshake as "local", the client will
+bypass the network RPC protocol for read, write and commit operations.
+Due to this XDR and RPC bypass, these operations will operate faster.
+
+NFS Common and Server
+---------------------
+
+Localio is used by nfsd to add access to a global nfsd_uuids list in
+nfs_common that is used to register and then identify local nfsd
+instances.
+
+nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
+composed of nfsd_uuid_t instances that are managed as nfsd creates them
+(per network namespace).
+
+nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
+nfsd for the client specified nfsd uuid.
+
+The nfsd_uuids list is the basis for localio enablement, as such it has
+members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client
+----------
+
+fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
+LOCALIO protocol and check if the server with that uuid is known to be
+local. This ensures client and server 1: support localio 2: are local
+to each other.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of nfsd_uuid_t struct to allow a client local to a server to
+open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a localio client is connected to it.
+
+Testing
+-------
+
+The LOCALIO auxiliary protocol and associated NFS localio read, write
+and commit access have proven stable against various test scenarios but
+these have not yet been formalized in any testsuite:
+
+- Client and server both on localhost (for both v3 and v4.2).
+
+- Various permutations of client and server support enablement for
+ both local and remote client and server. Testing against NFS storage
+ products that don't support the LOCALIO protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2)
+ The container testing was in terms of podman managed containers and
+ includes container stop/restart scenario.
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 22443d2089eb..e8e3117abb5f 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -21,6 +21,8 @@ extern struct list_head nfsd_uuids;
* Each nfsd instance has an nfsd_uuid_t that is accessible through the
* global nfsd_uuids list. Useful to allow a client to negotiate if localio
* possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
*/
typedef struct {
uuid_t uuid;
--
2.44.0
next prev parent reply other threads:[~2024-07-02 16:28 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 02/20] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 03/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 04/20] nfsd: add "localio" support Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 05/20] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 06/20] nfsd: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 07/20] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 08/20] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 09/20] SUNRPC: replace program list with program array Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 10/20] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 11/20] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 12/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 13/20] nfs: add "localio" support Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 14/20] nfs: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 15/20] nfs: enable localio for non-pNFS I/O Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 16/20] pnfs/flexfiles: enable localio for flexfiles I/O Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 17/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 18/20] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 19/20] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-07-02 16:28 ` Mike Snitzer [this message]
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
2024-07-02 18:32 ` Mike Snitzer
2024-07-02 20:10 ` Chuck Lever III
2024-07-03 0:57 ` Mike Snitzer
2024-07-03 0:52 ` NeilBrown
2024-07-03 1:13 ` Mike Snitzer
2024-07-03 5:04 ` Christoph Hellwig
2024-07-03 8:52 ` Mike Snitzer
2024-07-03 14:16 ` Christoph Hellwig
2024-07-03 15:11 ` Mike Snitzer
2024-07-03 15:18 ` Christoph Hellwig
2024-07-03 15:24 ` Chuck Lever III
2024-07-03 15:29 ` Christoph Hellwig
2024-07-03 15:36 ` Mike Snitzer
2024-07-03 17:06 ` Jeff Layton
2024-07-04 6:00 ` Christoph Hellwig
2024-07-04 18:31 ` Mike Snitzer
2024-07-05 5:18 ` Christoph Hellwig
2024-07-05 13:35 ` Chuck Lever III
2024-07-05 13:39 ` Christoph Hellwig
2024-07-05 14:15 ` Mike Snitzer
2024-07-05 14:18 ` Christoph Hellwig
2024-07-05 14:36 ` Mike Snitzer
2024-07-05 14:59 ` Chuck Lever III
2024-07-06 3:58 ` Mike Snitzer
2024-07-06 5:52 ` NeilBrown
2024-07-06 13:05 ` "why NFSv3?" [was: Re: [PATCH v11 00/20] nfs/nfsd: add support for localio] Mike Snitzer
2024-07-06 5:58 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Christoph Hellwig
2024-07-06 13:12 ` Mike Snitzer
2024-07-08 9:46 ` Christoph Hellwig
2024-07-08 12:55 ` Mike Snitzer
2024-07-06 16:58 ` Chuck Lever III
2024-07-07 0:42 ` Mike Snitzer
2024-07-05 18:59 ` Jeff Layton
2024-07-05 22:08 ` NeilBrown
2024-07-06 6:02 ` Christoph Hellwig
2024-07-06 6:37 ` NeilBrown
2024-07-06 6:42 ` Christoph Hellwig
2024-07-06 17:15 ` Chuck Lever III
2024-07-08 4:10 ` NeilBrown
2024-07-08 14:41 ` Chuck Lever III
2024-07-08 9:40 ` Christoph Hellwig
2024-07-08 4:03 ` NeilBrown
2024-07-08 9:37 ` Christoph Hellwig
2024-07-10 0:10 ` NeilBrown
2024-07-03 17:19 ` Chuck Lever III
2024-07-03 19:04 ` Mike Snitzer
2024-07-04 5:55 ` Christoph Hellwig
2024-07-03 21:35 ` NeilBrown
2024-07-04 6:01 ` Christoph Hellwig
2024-07-04 10:13 ` Jeff Layton
2024-07-03 15:26 ` Chuck Lever III
2024-07-03 15:37 ` Mike Snitzer
2024-07-03 15:16 ` Christoph Hellwig
2024-07-03 15:28 ` Mike Snitzer
2024-07-04 5:49 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240702162831.91604-21-snitzer@kernel.org \
--to=snitzer@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=snitzer@hammerspace.com \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox