All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org, Jeff Layton <jlayton@kernel.org>,
	Trond Myklebust <trondmy@hammerspace.com>,
	snitzer@hammerspace.com
Subject: Re: [PATCH v6 17/18] nfs: add Documentation/filesystems/nfs/localio.rst
Date: Thu, 20 Jun 2024 20:38:45 -0400	[thread overview]
Message-ID: <ZnTLlRaD7-28onLr@kernel.org> (raw)
In-Reply-To: <ZnTJjKRiAHLz9GxG@kernel.org>

On Thu, Jun 20, 2024 at 08:30:04PM -0400, Mike Snitzer wrote:
> On Fri, Jun 21, 2024 at 09:42:26AM +1000, NeilBrown wrote:
> > On Fri, 21 Jun 2024, Chuck Lever wrote:
> > > On Thu, Jun 20, 2024 at 06:35:38PM -0400, Mike Snitzer wrote:
> > > > On Fri, Jun 21, 2024 at 08:12:56AM +1000, NeilBrown wrote:
> > > > > On Thu, 20 Jun 2024, Chuck Lever wrote:
> > > > > > On Wed, Jun 19, 2024 at 04:40:31PM -0400, Mike Snitzer wrote:
> > > > > > > This document gives an overview of the LOCALIO auxiliary RPC protocol
> > > > > > > added to the Linux NFS client and server (both v3 and v4) to allow a
> > > > > > > client and server to reliably handshake to determine if they are on the
> > > > > > > same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > > > > > > the same connection as NFS traffic, follows the pattern established by
> > > > > > > the NFS ACL protocol extension.
> > > > > > > 
> > > > > > > The robust handshake between local client and server is just the
> > > > > > > beginning, the ultimate usecase this locality makes possible is the
> > > > > > > client is able to issue reads, writes and commits directly to the server
> > > > > > > without having to go over the network.  This is particularly useful for
> > > > > > > container usecases (e.g. kubernetes) where it is possible to run an IO
> > > > > > > job local to the server.
> > > > > > > 
> > > > > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > > > > ---
> > > > > > >  Documentation/filesystems/nfs/localio.rst | 148 ++++++++++++++++++++++
> > > > > > >  include/linux/nfslocalio.h                |   2 +
> > > > > > >  2 files changed, 150 insertions(+)
> > > > > > >  create mode 100644 Documentation/filesystems/nfs/localio.rst
> > > > > > > 
> > > > > > > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..a43c3dab2cab
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/filesystems/nfs/localio.rst
> > > > > > > @@ -0,0 +1,148 @@
> > > > > > > +===========
> > > > > > > +NFS localio
> > > > > > > +===========
> > > > > > > +
> > > > > > > +This document gives an overview of the LOCALIO auxiliary RPC protocol
> > > > > > > +added to the Linux NFS client and server (both v3 and v4) to allow a
> > > > > > > +client and server to reliably handshake to determine if they are on the
> > > > > > > +same host.  The LOCALIO auxiliary protocol's implementation, which uses
> > > > > > > +the same connection as NFS traffic, follows the pattern established by
> > > > > > > +the NFS ACL protocol extension.
> > > > > > > +
> > > > > > > +The LOCALIO auxiliary protocol is needed to allow robust discovery of
> > > > > > > +clients local to their servers.  Prior to this LOCALIO protocol a
> > > > > > > +fragile sockaddr network address based match against all local network
> > > > > > > +interfaces was attempted.  But unlike the LOCALIO protocol, the
> > > > > > > +sockaddr-based matching didn't handle use of iptables or containers.
> > > > > > > +
> > > > > > > +The robust handshake between local client and server is just the
> > > > > > > +beginning, the ultimate usecase this locality makes possible is the
> > > > > > > +client is able to issue reads, writes and commits directly to the server
> > > > > > > +without having to go over the network.  This is particularly useful for
> > > > > > > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > > > > > > +job local to the server.
> > > > > > > +
> > > > > > > +The performance advantage realized from localio's ability to bypass
> > > > > > > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > > > > > > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > > > > > > +-  With localio:
> > > > > > > +  read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > > > > > > +-  Without localio:
> > > > > > > +  read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > > > > > > +
> > > > > > > +RPC
> > > > > > > +---
> > > > > > > +
> > > > > > > +The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
> > > > > > > +method that allows the Linux nfs client to retrieve a Linux nfs server's
> > > > > > > +uuid.  This protocol isn't part of an IETF standard, nor does it need to
> > > > > > > +be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
> > > > > > > +to an implementation detail.
> > > > > > > +
> > > > > > > +The GETUUID method encodes the server's uuid_t in terms of the fixed
> > > > > > > +UUID_SIZE (16 bytes).  The fixed size opaque encode and decode XDR
> > > > > > > +methods are used instead of the less efficient variable sized methods.
> > > > > > > +
> > > > > > > +The RPC program number for the NFS_LOCALIO_PROGRAM is currently defined
> > > > > > > +as 0x20000002 (but a request for a unique RPC program number assignment
> > > > > > > +has been submitted to IANA.org).
> > > > > > > +
> > > > > > > +The following approximately describes the LOCALIO in a pseudo rpcgen .x
> > > > > > > +syntax:
> > > > > > > +
> > > > > > > +#define UUID_SIZE 16
> > > > > > > +typedef u8 uuid_t<UUID_SIZE>;
> > > > > > > +
> > > > > > > +program NFS_LOCALIO_PROGRAM {
> > > > > > > +     version NULLVERS {
> > > > > > > +        void NULL(void) = 0;
> > > > > > > +	} = 1;
> > > > > > > +     version GETUUIDVERS {
> > > > > > > +        uuid_t GETUUID(void) = 1;
> > > > > > > +	} = 1;
> > > > > > > +} = 0x20000002;
> > > > > > > +
> > > > > > > +The above is the skeleton for the LOCALIO protocol, it doesn't account
> > > > > > > +for NFS v3 and v4 RPC boilerplate (which also marshalls RPC status) that
> > > > > > > +is used to implement GETUUID.
> > > > > > > +
> > > > > > > +Here are the respective XDR results for nfsd and nfs:
> > > > > > 
> > > > > > Hi Mike!
> > > > > > 
> > > > > > A protocol spec describes the on-the-wire data formats, not the
> > > > > > in-memory structure layouts. The below C structures are not
> > > > > > relevant to this specification. This should be all you need here,
> > > > > > if I understand your protocol correctly:
> > > > > > 
> > > > > > /* raw RFC 9562 UUID */
> > > > > > #define UUID_SIZE 16
> > > > > > typedef u8 uuid_t<UUID_SIZE>;
> > > > > > 
> > > > > > union GETUUID1res switch (uint32 status) {
> > > > > 
> > > > > I don't think we need a status in the protocol.  GETUUID always returns
> > > > > a UUID.  There is no possible error condition.
> > > > 
> > > > By having localio use NFS's XDR we're able to piggyback on a status
> > > > being returned by standard NFS RPC handling.
> > > > 
> > > > See:
> > > > nfs3svc_encode_getuuidres and nfs4svc_encode_getuuidres.
> > > > nfs3_xdr_dec_getuuidres and nfs4_xdr_dec_getuuidres (and note the
> > > > FIXME comment about abusing nfs_opnum4).
> > > 
> > > No, let's not piggyback like that. Please make it a separate
> > > XDR implementation just like NFSACL is. Again, LOCALIO is not
> > > an extension of the NFS protocol. Making that claim confuses
> > > people for whom the term "extension" has a very precise meaning.
> > > If we were extending NFS, then yes, adding the new procedures
> > > to the NFS XDR implementation is appropriate, but that's not
> > > what you are doing: you are adding a new side-band protocol.
> > 
> > I'm currently working through the LOCALIO protocol code to make it a
> > single version rather than '3' and '4'.  In the process I'm making it
> > completely separate from the NFS protocol implementation and cleaning up
> > some other bits.  e.g. it shouldn't register with rpcbind.
> > 
> > I'll hopefully post patches in a few hours.  I writing this now to
> > discourage Mike from starting work on this.
> 
> Cool, thanks Neil!

Oh, please base your changes on my latest nfs-localio-for-6.11 branch:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-6.11


  reply	other threads:[~2024-06-21  0:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19 20:40 [PATCH v6 00/18] nfs/nfsd: add support for localio Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 01/18] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 02/18] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 03/18] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 04/18] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 05/18] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-06-21  4:43   ` Jeff Johnson
2024-06-19 20:40 ` [PATCH v6 06/18] nfs/nfsd: add "localio" support Mike Snitzer
2024-06-21  6:08   ` NeilBrown
2024-06-21 23:28     ` Mike Snitzer
2024-06-23 22:27       ` NeilBrown
2024-06-25  4:59         ` Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 07/18] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 08/18] NFS: Enable localio for non-pNFS I/O Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 09/18] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 10/18] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 11/18] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 12/18] nfsd: implement v3 and v4 server " Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 13/18] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 14/18] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 15/18] nfsd: " Mike Snitzer
2024-06-21  6:35   ` NeilBrown
2024-06-21 23:58     ` Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 16/18] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
2024-06-19 20:40 ` [PATCH v6 17/18] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-20 13:52   ` Chuck Lever
2024-06-20 14:33     ` Mike Snitzer
2024-06-20 14:45       ` Chuck Lever
2024-06-20 22:12     ` NeilBrown
2024-06-20 22:35       ` Mike Snitzer
2024-06-20 23:28         ` Chuck Lever
2024-06-20 23:42           ` NeilBrown
2024-06-21  0:30             ` Mike Snitzer
2024-06-21  0:38               ` Mike Snitzer [this message]
2024-06-21  0:28           ` Mike Snitzer
2024-06-21  2:18             ` Chuck Lever III
2024-06-19 20:40 ` [PATCH v6 18/18] nfs/nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-06-20  5:04 ` [PATCH v6 00/18] nfs/nfsd: add support for localio Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnTLlRaD7-28onLr@kernel.org \
    --to=snitzer@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=snitzer@hammerspace.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.