public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Aloni <dan@kernelim.com>
To: Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-nfs@vger.kernel.org
Cc: Jim Foraker <foraker1@llnl.gov>, Ben Woodard <woodard@redhat.com>
Subject: [RFC] NFSv3 RDMA multipath enhancements
Date: Tue, 12 Jan 2021 16:17:06 +0200	[thread overview]
Message-ID: <20210112141706.GA3146539@gmail.com> (raw)

Hi Trond, Anna,

We currently have several field installations containing NFS and
SunRPC-related patches that greatly improve performance of NFSv3 clients
over RDMA setups, where link aggregation is not supported.

I would like work to integrate several of these changes to upstream, and
discuss their implementation. We managed to get a bandwidth of 33 GB/sec
from single node NFSv3 mount, and later around 92 GB/sec from a single
mount using further enhancements in RPC request dispatch.

The main change allows specifying multiple target IP addresses in a
single mount, that combined with nconnect and multiple floating IPs,
provides load balancing over several target nodes. This is good for
systems where load balancing is managed by moving a group of floating IP
addresses. This works especially well on RoCE setups.

The networking setup on these clients comprises of multiple RDMA network
interfaces that are connected to the same network, and each has its own
IP address.

The proposed change specifies a new `remoteports=<IP-addresses-ranges>`
mount option providing a group of IP addresses, from which `nconnect` at
sunrpc scope picks target transport address in round-robin. There's also
an accompanying `localports` parameter that allows local address bind so
that the source port is better controlled, in a way to ensure that
transports are not hogging a single local interface. So essentially,
this is a form of session trunking, that can be thought as an extension
to the existing `nconnect` parameter.

To my understanding NFSv4.x with pNFS has advanced dynamic transport
management logic along file layouts supporting stripe over file offsets,
however there are cases in which we would like to achieve good
performance even with the older protocol.

Before I adjust the patches I'm testing for v5.11, do you see other
implementation or user interface considerations I should take into
account?

Thanks

-- 
Dan Aloni

             reply	other threads:[~2021-01-12 14:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 14:17 Dan Aloni [this message]
2021-01-13 14:59 ` [RFC] NFSv3 RDMA multipath enhancements Chuck Lever
2021-01-21 18:58   ` Dan Aloni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210112141706.GA3146539@gmail.com \
    --to=dan@kernelim.com \
    --cc=anna.schumaker@netapp.com \
    --cc=foraker1@llnl.gov \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox