public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>, linux-nfs@vger.kernel.org
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	NeilBrown <neilb@suse.de>,
	snitzer@hammerspace.com
Subject: Re: [PATCH v7 19/20] nfs: add Documentation/filesystems/nfs/localio.rst
Date: Tue, 25 Jun 2024 07:59:31 -0400	[thread overview]
Message-ID: <2dfb2b239031ac4fd34996fdb3d404d1160f2158.camel@kernel.org> (raw)
In-Reply-To: <20240624162741.68216-20-snitzer@kernel.org>

On Mon, 2024-06-24 at 12:27 -0400, Mike Snitzer wrote:
> This document gives an overview of the LOCALIO auxiliary RPC protocol
> added to the Linux NFS client and server (both v3 and v4) to allow a
> client and server to reliably handshake to determine if they are on the
> same host.  The LOCALIO auxiliary protocol's implementation, which uses
> the same connection as NFS traffic, follows the pattern established by
> the NFS ACL protocol extension.
> 
> The robust handshake between local client and server is just the
> beginning, the ultimate usecase this locality makes possible is the
> client is able to issue reads, writes and commits directly to the server
> without having to go over the network.  This is particularly useful for
> container usecases (e.g. kubernetes) where it is possible to run an IO
> job local to the server.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  Documentation/filesystems/nfs/localio.rst | 134 ++++++++++++++++++++++
>  include/linux/nfslocalio.h                |   2 +
>  2 files changed, 136 insertions(+)
>  create mode 100644 Documentation/filesystems/nfs/localio.rst
> 
> diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> new file mode 100644
> index 000000000000..e856b6273e78
> --- /dev/null
> +++ b/Documentation/filesystems/nfs/localio.rst
> @@ -0,0 +1,134 @@
> +===========
> +NFS localio
> +===========
> +
> +This document gives an overview of the LOCALIO auxiliary RPC protocol
> +added to the Linux NFS client and server (both v3 and v4) to allow a
> +client and server to reliably handshake to determine if they are on the
> +same host.  The LOCALIO auxiliary protocol's implementation, which uses
> +the same connection as NFS traffic, follows the pattern established by
> +the NFS ACL protocol extension.
> +
> +The LOCALIO auxiliary protocol is needed to allow robust discovery of
> +clients local to their servers.  Prior to this LOCALIO protocol a
> +fragile sockaddr network address based match against all local network
> +interfaces was attempted.  But unlike the LOCALIO protocol, the
> +sockaddr-based matching didn't handle use of iptables or containers.
> +

The above paragraph sounds like there was an earlier implementation in
mainline kernels that used address matching. It might be good to point
out that that was a private implementation.

> +The robust handshake between local client and server is just the
> +beginning, the ultimate usecase this locality makes possible is the
> +client is able to issue reads, writes and commits directly to the server
> +without having to go over the network.  This is particularly useful for
> +container usecases (e.g. kubernetes) where it is possible to run an IO
> +job local to the server.
> +
> +The performance advantage realized from localio's ability to bypass
> +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> +-  With localio:
> +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> +-  Without localio:
> +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> +
> +RPC
> +---
> +
> +The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
> +method that allows the Linux NFS client to retrieve a Linux NFS server's
> +uuid.  This protocol isn't part of an IETF standard, nor does it need to
> +be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
> +to an implementation detail.
> +
> +The GETUUID method encodes the server's uuid_t in terms of the fixed
> +UUID_SIZE (16 bytes).  The fixed size opaque encode and decode XDR
> +methods are used instead of the less efficient variable sized methods.
> +
> +The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
> +by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
> +Linux Kernel Organization       400122  nfslocalio
> +

Nice! Glad this got officially registered fast.

> +The LOCALIO protocol spec in rpcgen syntax is:
> +
> +/* raw RFC 9562 UUID */
> +#define UUID_SIZE 16
> +typedef u8 uuid_t<UUID_SIZE>;
> +
> +program NFS_LOCALIO_PROGRAM {
> +    version LOCALIO_V1 {
> +        void
> +            NULL(void) = 0;
> +
> +        uuid_t
> +            GETUUID(void) = 1;
> +    } = 1;
> +} = 400122;
> +
> +LOCALIO uses the same transport connection as NFS traffic.  As such,
> +LOCALIO is not registered with rpcbind.
> +
> +Once an NFS client and server handshake as "local", the client will
> +bypass the network RPC protocol for read, write and commit operations.
> +Due to this XDR and RPC bypass, these operations will operate faster.
> +
> +NFS Common and Server
> +---------------------
> +
> +First use is in nfsd, to add access to a global nfsd_uuids list in
> +nfs_common that is used to register and then identify local nfsd
> +instances.
> +

First use of what? This sentence doesn't parse well.

> +nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
> +composed of nfsd_uuid_t instances that are managed as nfsd creates them
> +(per network namespace).
> +
> +nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
> +nfsd for the client specified nfsd uuid.
> +
> +The nfsd_uuids list is the basis for localio enablement, as such it has
> +members that point to nfsd memory for direct use by the client
> +(e.g. 'net' is the server's network namespace, through it the client can
> +access nn->nfsd_serv with proper rcu read access).  It is this client
> +and server synchronization that enables advanced usage and lifetime of
> +objects to span from the host kernel's nfsd to per-container knfsd
> +instances that are connected to nfs client's running on the same local
> +host.
> +
> +NFS Client
> +----------
> +
> +fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
> +LOCALIO protocol and check if the server with that uuid is known to be
> +local.  This ensures client and server 1: support localio 2: are local
> +to each other.
> +
> +See fs/nfs/localio.c:nfs_local_open_fh() and
> +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
> +focused use of nfsd_uuid_t struct to allow a client local to a server to
> +open a file pointer without needing to go over the network.
> +
> +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
> +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
> +both the nfsd network namespace and the associated nn->nfsd_serv in
> +terms of RCU.  If nfsd_open_local_fh() finds that client no longer sees
> +valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
> +to nfs_local_open_fh() and the client will try to reestablish the
> +LOCALIO resources needed by calling nfs_local_probe() again.  This
> +recovery is needed if/when an nfsd instance running in a container were
> +to reboot while a localio client is connected to it.
> +
> +Testing
> +-------
> +
> +The LOCALIO auxiliary protocol and associated NFS localio read, write
> +and commit access have proven stable against various test scenarios but
> +these have not yet been formalized in any testsuite:
> +
> +-  Client and server both on localhost (for both v3 and v4.2).
> +
> +-  Various permutations of client and server support enablement for
> +   both local and remote client and server.  Testing against NFS storage
> +   products that don't support the LOCALIO protocol was also performed.
> +
> +-  Client on host, server within a container (for both v3 and v4.2)
> +   The container testing was in terms of podman managed containers and
> +   includes container stop/restart scenario.
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index c9592ad0afe2..a9722e18b527 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids;
>   * Each nfsd instance has an nfsd_uuid_t that is accessible through the
>   * global nfsd_uuids list. Useful to allow a client to negotiate if localio
>   * possible with its server.
> + *
> + * See Documentation/filesystems/nfs/localio.rst for more detail.
>   */
>  typedef struct {
>  	uuid_t uuid;

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2024-06-25 11:59 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-24 16:27 [PATCH v7 00/20] nfs/nfsd: add support for localio Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 01/20] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 02/20] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 03/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 04/20] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 05/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-06-25 23:33   ` NeilBrown
2024-06-26 16:50     ` Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 06/20] nfs/nfsd: add "localio" support Mike Snitzer
2024-06-24 18:26   ` Chuck Lever
2024-06-25  4:57     ` Mike Snitzer
2024-06-25 13:59       ` Chuck Lever
2024-06-24 16:27 ` [PATCH v7 07/20] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 08/20] NFS: Enable localio for non-pNFS I/O Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 09/20] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 10/20] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-06-25 23:15   ` NeilBrown
2024-06-24 16:27 ` [PATCH v7 11/20] nfs/nfsd: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-06-24 18:28   ` Chuck Lever
2024-06-24 16:27 ` [PATCH v7 12/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
2024-06-25 23:19   ` NeilBrown
2024-06-26 16:53     ` Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 13/20] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-25 23:21   ` NeilBrown
2024-06-26 16:45     ` Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 14/20] nfsd: implement server " Mike Snitzer
2024-06-24 18:45   ` Chuck Lever
2024-06-25 23:23   ` NeilBrown
2024-06-26 16:27     ` Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 15/20] SUNRPC: replace program list with program array Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 16/20] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 17/20] nfsd: " Mike Snitzer
2024-06-25 12:43   ` Jeff Layton
2024-06-25 23:29     ` NeilBrown
2024-06-26 16:49       ` Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 18/20] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
2024-06-24 16:27 ` [PATCH v7 19/20] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-25 11:59   ` Jeff Layton [this message]
2024-06-24 16:27 ` [PATCH v7 20/20] nfs/nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-06-25 12:49 ` [PATCH v7 00/20] nfs/nfsd: add support for localio Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dfb2b239031ac4fd34996fdb3d404d1160f2158.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=snitzer@hammerspace.com \
    --cc=snitzer@kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox