From: Trond Myklebust <trondmy@hammerspace.com>
To: "SteveD@RedHat.com" <SteveD@RedHat.com>,
"olga.kornievskaia@gmail.com" <olga.kornievskaia@gmail.com>,
"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 1/1] NFSv4.0 allow nconnect for v4.0
Date: Mon, 27 Jan 2020 19:35:04 +0000 [thread overview]
Message-ID: <ba661bbfe1c87c7b347841bf973511e396665ee3.camel@hammerspace.com> (raw)
In-Reply-To: <3e85d5f8-bcdc-b8ee-3ea4-b918f084fd19@RedHat.com>
On Mon, 2020-01-27 at 13:39 -0500, Steve Dickson wrote:
>
> On 1/22/20 7:23 PM, Trond Myklebust wrote:
> > On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote:
> > > Hello,
> > >
> > > On 1/16/20 2:08 PM, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > > fs/nfs/nfs4client.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > > index 460d625..4df3fb0 100644
> > > > --- a/fs/nfs/nfs4client.c
> > > > +++ b/fs/nfs/nfs4client.c
> > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct
> > > > nfs_server
> > > > *server,
> > > >
> > > > if (minorversion == 0)
> > > > __set_bit(NFS_CS_REUSEPORT,
> > > > &cl_init.init_flags);
> > > > - else if (proto == XPRT_TRANSPORT_TCP)
> > > > + if (proto == XPRT_TRANSPORT_TCP)
> > > > cl_init.nconnect = nconnect;
> > > >
> > > > if (server->flags & NFS_MOUNT_NORESVPORT)
> > > >
> > > Tested-by: Steve Dickson <steved@redhat.com>
> > >
> > > With this patch v4.0 mounts act just like v4.1/v4.2 mounts
> > > But is that a good thing. :-)
> > >
> > > Here is what I've found in my testing...
> > >
> > > mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> > >
> > > Will create 12 TCP connections and maintain those 12
> > > connections until the umount happens. By maintain I mean
> > > if the connection times out, it is reconnected
> > > to maintain the 12 connections
> > >
> > > # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> > > # netstat -an | grep 172.31.1.54 | wc -l
> > > 12
> > > # netstat -an | grep 172.31.1.54
> > > tcp 0 0
> > > 172.31.1.24:901 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:667 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:746 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:672 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:832 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:895 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:673 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:732 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:795 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:918 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:674 172.31.1.54:2049 ESTABLISHED
> > > tcp 0 0
> > > 172.31.1.24:953 172.31.1.54:2049 ESTABLISHED
> > >
> > > # umount /mnt/tmp
> > > # netstat -an | grep 172.31.1.54 | wc -l
> > > 12
> > > # netstat -an | grep 172.31.1.54
> > > tcp 0 0
> > > 172.31.1.24:901 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:667 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:746 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:672 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:832 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:895 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:673 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:732 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:795 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:918 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:674 172.31.1.54:2049 TIME_WAIT
> > > tcp 0 0
> > > 172.31.1.24:953 172.31.1.54:2049 TIME_WAIT
> > >
> > > Is this the expected behavior?
> > >
> > > If so I have a few concerns...
> > >
> > > * The connections walk all over the /etc/services namespace.
> > > Meaning
> > > using ports that are reserved for registered services, something
> > > we've tried to avoid in userland by not binding to privilege
> > > ports
> > > and
> > > use of backlist ports via /etc/bindresvport.blacklist
> > >
> > > * When the unmount happens, all those connections go into
> > > TIME_WAIT
> > > on
> > > privilege ports and there are only so many of those. Not good
> > > during
> > > mount
> > > storms (when a server reboots and thousand of home dirs are
> > > remounted).
> > >
> > > * No man page describing the new feature.
> > >
> > > I realize there is not much we can do about some of these
> > > (aka umount==>TIME_WAIT) but I think we need to document
> > > what we are doing to people's connection namespace when
> > > they use this feature.
> >
> > I'm not sure that I understand the concern. The connections are
> > limited
> > to a specific window of ports by the min_resvport/max_resvport
> > sunrpc
> > module parameters just as they were before we added 'nconnect'.
> > Nothing
> > has changed in the way we choose ports...
> >
> Maybe this problem has existed for a while...
>
> Here are the mins/max ports
> RPC_DEF_MIN_RESVPORT (665U)
> RPC_DEF_MAX_RESVPORT (1023U)
>
> From /etc/services there are the services in that range
> acp(674), ha-cluster(694), kerberos-adm(749), kerberos-iv(750)
> webster(765), phonebook(767), rsync(873), rquotad(875),
> telnets(992), imaps(993), pop3s(995)
>
> Granted a lot of these are unused/legacy services, but some of
> them, like imaps and rsync, are still used.
>
> My point is since the nconnect connections are persistent, for
> the life of the mount, we could end up squatting on ports other
> services will needed.
>
> Maybe there is not much we can do about this... But we should explain
> somewhere, like the man page, that nconnect will create up to 16
> persistent connection on register privilege ports.
If users have a need to run servers on a port that might chosen by a
kernel nfs, lockd or rpcbind client, then they can guarantee no
collisions by redefining the 'privileged ports' available to sunrpc to
any arbitrary range <portnr1> - <portnr2>:
Either
* Reserve the ports at module load time, by adding a line to a config
file /etc/modprobe.d/foo.conf of the form
options sunrpc min_resvport=<portnr1> max_resvport=<portnr2>
or
* Change the port reservation after module load (but before mounting
your first NFS filesystem) using
echo '<portnr1>' > /sys/module/sunrpc/parameters/min_resvport
echo '<portnr2>' > /sys/module/sunrpc/parameters/max_resvport
This is something users ought to be doing already if they need
guaranteed availability of specific ports in the range 665-1023 while
using the NFS client. That is entirely independent of whether or not
they are using nconnect.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2020-01-27 19:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-16 19:08 [PATCH 1/1] NFSv4.0 allow nconnect for v4.0 Olga Kornievskaia
2020-01-17 21:09 ` Schumaker, Anna
2020-01-17 21:14 ` Trond Myklebust
2020-01-17 21:16 ` Schumaker, Anna
2020-06-11 20:09 ` J. Bruce Fields
2020-06-11 22:46 ` Rick Macklem
2020-06-11 23:00 ` Rick Macklem
2020-06-12 13:26 ` Olga Kornievskaia
2020-06-23 14:35 ` [PATCH] " J. Bruce Fields
2020-01-20 15:35 ` [PATCH 1/1] " Steve Dickson
2020-01-23 0:23 ` Trond Myklebust
2020-01-27 18:39 ` Steve Dickson
2020-01-27 19:35 ` Trond Myklebust [this message]
2020-01-31 19:56 ` Olga Kornievskaia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba661bbfe1c87c7b347841bf973511e396665ee3.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=SteveD@RedHat.com \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=olga.kornievskaia@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).