public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org, Jeff Layton <jlayton@kernel.org>,
	Trond Myklebust <trondmy@hammerspace.com>,
	NeilBrown <neilb@suse.de>,
	snitzer@hammerspace.com
Subject: Re: [PATCH v6 17/18] nfs: add Documentation/filesystems/nfs/localio.rst
Date: Thu, 20 Jun 2024 10:33:15 -0400	[thread overview]
Message-ID: <ZnQ9q9n1wJrBNRC9@kernel.org> (raw)
In-Reply-To: <ZnQ0FSQHJLPHxRsP@tissot.1015granger.net>

On Thu, Jun 20, 2024 at 09:52:21AM -0400, Chuck Lever wrote:
> On Wed, Jun 19, 2024 at 04:40:31PM -0400, Mike Snitzer wrote:
> > This document gives an overview of the LOCALIO auxiliary RPC protocol
> > added to the Linux NFS client and server (both v3 and v4) to allow a
> > client and server to reliably handshake to determine if they are on the
> > same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > the same connection as NFS traffic, follows the pattern established by
> > the NFS ACL protocol extension.
> > 
> > The robust handshake between local client and server is just the
> > beginning, the ultimate usecase this locality makes possible is the
> > client is able to issue reads, writes and commits directly to the server
> > without having to go over the network.  This is particularly useful for
> > container usecases (e.g. kubernetes) where it is possible to run an IO
> > job local to the server.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  Documentation/filesystems/nfs/localio.rst | 148 ++++++++++++++++++++++
> >  include/linux/nfslocalio.h                |   2 +
> >  2 files changed, 150 insertions(+)
> >  create mode 100644 Documentation/filesystems/nfs/localio.rst
> > 
> > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > new file mode 100644
> > index 000000000000..a43c3dab2cab
> > --- /dev/null
> > +++ b/Documentation/filesystems/nfs/localio.rst
> > @@ -0,0 +1,148 @@
> > +===========
> > +NFS localio
> > +===========
> > +
> > +This document gives an overview of the LOCALIO auxiliary RPC protocol
> > +added to the Linux NFS client and server (both v3 and v4) to allow a
> > +client and server to reliably handshake to determine if they are on the
> > +same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > +the same connection as NFS traffic, follows the pattern established by
> > +the NFS ACL protocol extension.
> > +
> > +The LOCALIO auxiliary protocol is needed to allow robust discovery of
> > +clients local to their servers.  Prior to this LOCALIO protocol a
> > +fragile sockaddr network address based match against all local network
> > +interfaces was attempted.  But unlike the LOCALIO protocol, the
> > +sockaddr-based matching didn't handle use of iptables or containers.
> > +
> > +The robust handshake between local client and server is just the
> > +beginning, the ultimate usecase this locality makes possible is the
> > +client is able to issue reads, writes and commits directly to the server
> > +without having to go over the network.  This is particularly useful for
> > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > +job local to the server.
> > +
> > +The performance advantage realized from localio's ability to bypass
> > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > +-  With localio:
> > +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > +-  Without localio:
> > +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > +
> > +RPC
> > +---
> > +
> > +The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
> > +method that allows the Linux nfs client to retrieve a Linux nfs server's
> > +uuid.  This protocol isn't part of an IETF standard, nor does it need to
> > +be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
> > +to an implementation detail.
> > +
> > +The GETUUID method encodes the server's uuid_t in terms of the fixed
> > +UUID_SIZE (16 bytes).  The fixed size opaque encode and decode XDR
> > +methods are used instead of the less efficient variable sized methods.
> > +
> > +The RPC program number for the NFS_LOCALIO_PROGRAM is currently defined
> > +as 0x20000002 (but a request for a unique RPC program number assignment
> > +has been submitted to IANA.org).
> > +
> > +The following approximately describes the LOCALIO in a pseudo rpcgen .x
> > +syntax:
> > +
> > +#define UUID_SIZE 16
> > +typedef u8 uuid_t<UUID_SIZE>;
> > +
> > +program NFS_LOCALIO_PROGRAM {
> > +     version NULLVERS {
> > +        void NULL(void) = 0;
> > +	} = 1;
> > +     version GETUUIDVERS {
> > +        uuid_t GETUUID(void) = 1;
> > +	} = 1;
> > +} = 0x20000002;
> > +
> > +The above is the skeleton for the LOCALIO protocol, it doesn't account
> > +for NFS v3 and v4 RPC boilerplate (which also marshalls RPC status) that
> > +is used to implement GETUUID.
> > +
> > +Here are the respective XDR results for nfsd and nfs:
> 
> Hi Mike!
> 
> A protocol spec describes the on-the-wire data formats, not the
> in-memory structure layouts. The below C structures are not
> relevant to this specification. This should be all you need here,
> if I understand your protocol correctly:
> 
> /* raw RFC 9562 UUID */
> #define UUID_SIZE 16
> typedef u8 uuid_t<UUID_SIZE>;
> 
> union GETUUID1res switch (uint32 status) {
> case 0:
>     uuid_t  uuid;
> default:
>     void;
> };
> 
> program NFS_LOCALIO_PROGRAM {
>     version LOCALIO_V1 {
>         void
>             NULL(void) = 0;
> 
>         GETUUID1res
>             GETUUID(void) = 1;
>     } = 1;
> } = 0x20000002;

Thanks for this, nice to see I wasn't too far off.

> Then you need to discuss transport considerations:
> 
> - Whether this protocol is registered with the server's rpcbind
>   service,

It isn't, should it be?  Not familiar with what needs updating to do
it, but happy to work through it.

> - Which TCP/UDP port number does it use? Assuming 2049, and that
>   it will appear on the same transport connection as NFS traffic
>   (just like NFACL).

Correct.
 
> Should it be supported on port 20049 with RDMA as well?

Unless there is some additional code needed, I don't see why it
wouldn't.  But I haven't tested it (will look at NFS's RDMA support
and wrap my head around it).

> > +Testing
> > +-------
> > +
> > +The LOCALIO auxiliary protocol and associated NFS localio read, right
> > +and commit access have proven stable against various test scenarios but
> > +these have not yet been formalized in any testsuite:
> 
> Is there anywhere that describes what is needed to set up clients
> and a server to do local I/O? Then running the usual suite of NFS
> tests on that set up and comparing the nfsstat output on the local
> and remote clients should be a basic "smoke test" kind of thing
> that maintainers can use as a check-in test.

I just figured running nfsd and nfs client connecting to that
localhost was obvious.  But I can fill in more howto like info in this
section.

What is "the usual suite of NFS tests"?  I should run them ;)

(apologies if there is well established docs with pointers, still
learning to fish, thanks for your help!)

Mike

  reply	other threads:[~2024-06-20 14:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19 20:40 [PATCH v6 00/18] nfs/nfsd: add support for localio Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 01/18] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 02/18] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 03/18] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 04/18] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 05/18] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-06-21  4:43   ` Jeff Johnson
2024-06-19 20:40 ` [PATCH v6 06/18] nfs/nfsd: add "localio" support Mike Snitzer
2024-06-21  6:08   ` NeilBrown
2024-06-21 23:28     ` Mike Snitzer
2024-06-23 22:27       ` NeilBrown
2024-06-25  4:59         ` Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 07/18] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 08/18] NFS: Enable localio for non-pNFS I/O Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 09/18] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 10/18] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 11/18] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 12/18] nfsd: implement v3 and v4 server " Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 13/18] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 14/18] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 15/18] nfsd: " Mike Snitzer
2024-06-21  6:35   ` NeilBrown
2024-06-21 23:58     ` Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 16/18] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 17/18] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-20 13:52   ` Chuck Lever
2024-06-20 14:33     ` Mike Snitzer [this message]
2024-06-20 14:45       ` Chuck Lever
2024-06-20 22:12     ` NeilBrown
2024-06-20 22:35       ` Mike Snitzer
2024-06-20 23:28         ` Chuck Lever
2024-06-20 23:42           ` NeilBrown
2024-06-21  0:30             ` Mike Snitzer
2024-06-21  0:38               ` Mike Snitzer
2024-06-21  0:28           ` Mike Snitzer
2024-06-21  2:18             ` Chuck Lever III
2024-06-19 20:40 ` [PATCH v6 18/18] nfs/nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-06-20  5:04 ` [PATCH v6 00/18] nfs/nfsd: add support for localio Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnQ9q9n1wJrBNRC9@kernel.org \
    --to=snitzer@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=snitzer@hammerspace.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox