nconnect & repeating BIND_CONN_TO

All of lore.kernel.org
 help / color / mirror / Atom feed

* nconnect & repeating BIND_CONN_TO_SESSION?
@ 2022-01-07 12:26 Daire Byrne
  2022-01-07 17:17 ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-01-07 12:26 UTC (permalink / raw)
  To: linux-nfs

Hi,

I have been playing around with NFSv4.2 over very high latency
networks (200ms - nocto,actimeo=3600,nconnect) and I noticed that
lookups were much slower than expected.

Looking at a normal stat, at first with simple workloads, I see the
expected LOOKUP/ACCESS pairs for each directory and file in a path.
But after some period of time and with extra load on the client host
(I am also re-exporting these mounts but I don't think that's
relevant), I start to see a BIND_CONN_TO_SESSION call for every
nconnect connection before every LOOKUP & ACCESS. In the case of a
high latency network, these extra roundtrips kill performance.

I am using nconnect because it has some clear performance benefits
when doing sequential reads and writes over high latency connections.
If I use nconnect=16 then I end up with an extra 16
BIND_CONN_TO_SESSION roundtrips before every operation. And once it
gets into this state, there seems to be no way to stop it.

Now this client is actually mounting ~22 servers all with nconnect and
if I reduce the nconnect for all of them to "8" then I am less likely
to see these repeating BIND_CONN_TO_SESSION calls (although I still
see some). If I reduce the nconnect for each mount to 4, then I don't
see the BIND_CONN_TO_SESSION appear (yet) with our workloads. So I'm
wondering if there is some limit like the number of client mounts of
unique server (22) times the total number of TCP connections to each?
So in this case 22 servers x nconnect=8 = 176 client connections.

Or are there some sequence errors that trigger a BIND_CONN_TO_SESSION
and increasing the number of nconnect threads increases the chances of
triggering it? The remote servers are a mix of RHEL7 and RHEL8 and
seem to show the same behaviour.

I tried watching the rpcdebug stream but I'll admit I wasn't really
sure what to look for. I see the same thing on a bunch of recent
kernels (I've only tested from 5.12 upwards). This has probably been
happening for our workloads for quite some time but it's only when the
latency became so large that I noticed all these extra round trips.

Any pointers as to why this might be happening?

Cheers,

Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-07 12:26 nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
@ 2022-01-07 17:17 ` J. Bruce Fields
  2022-01-07 17:59   ` Rick Macklem
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: J. Bruce Fields @ 2022-01-07 17:17 UTC (permalink / raw)
  To: Daire Byrne; +Cc: linux-nfs

On Fri, Jan 07, 2022 at 12:26:07PM +0000, Daire Byrne wrote:
> Hi,
> 
> I have been playing around with NFSv4.2 over very high latency
> networks (200ms - nocto,actimeo=3600,nconnect) and I noticed that
> lookups were much slower than expected.
> 
> Looking at a normal stat, at first with simple workloads, I see the
> expected LOOKUP/ACCESS pairs for each directory and file in a path.
> But after some period of time and with extra load on the client host
> (I am also re-exporting these mounts but I don't think that's
> relevant), I start to see a BIND_CONN_TO_SESSION call for every
> nconnect connection before every LOOKUP & ACCESS. In the case of a
> high latency network, these extra roundtrips kill performance.
> 
> I am using nconnect because it has some clear performance benefits
> when doing sequential reads and writes over high latency connections.
> If I use nconnect=16 then I end up with an extra 16
> BIND_CONN_TO_SESSION roundtrips before every operation. And once it
> gets into this state, there seems to be no way to stop it.
> 
> Now this client is actually mounting ~22 servers all with nconnect and
> if I reduce the nconnect for all of them to "8" then I am less likely
> to see these repeating BIND_CONN_TO_SESSION calls (although I still
> see some). If I reduce the nconnect for each mount to 4, then I don't
> see the BIND_CONN_TO_SESSION appear (yet) with our workloads. So I'm
> wondering if there is some limit like the number of client mounts of
> unique server (22) times the total number of TCP connections to each?
> So in this case 22 servers x nconnect=8 = 176 client connections.

Hm, doesn't each of these use up a reserved port on the client by
default?  I forget the details of that.  Does "noresvport" help?

On the server (if Linux) there are maximums on the number of
connections.  It should be logging "too many open connections" if you're
hitting that.

--b.

> Or are there some sequence errors that trigger a BIND_CONN_TO_SESSION
> and increasing the number of nconnect threads increases the chances of
> triggering it? The remote servers are a mix of RHEL7 and RHEL8 and
> seem to show the same behaviour.
> 
> I tried watching the rpcdebug stream but I'll admit I wasn't really
> sure what to look for. I see the same thing on a bunch of recent
> kernels (I've only tested from 5.12 upwards). This has probably been
> happening for our workloads for quite some time but it's only when the
> latency became so large that I noticed all these extra round trips.
> 
> Any pointers as to why this might be happening?
> 
> Cheers,
> 
> Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-07 17:17 ` J. Bruce Fields
@ 2022-01-07 17:59   ` Rick Macklem
  2022-01-07 18:56     ` J. Bruce Fields
  2022-01-07 19:15   ` pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4 dai.ngo
  2022-01-10  9:21   ` nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
  2 siblings, 1 reply; 15+ messages in thread
From: Rick Macklem @ 2022-01-07 17:59 UTC (permalink / raw)
  To: J. Bruce Fields, Daire Byrne; +Cc: linux-nfs

Hope you don't mind a top post...

If you capture packets, you will probably see a
callback_down flag in the reply to Sequence for
RPC(s) just before the BindConnectionToSession.
(This is what normally triggers them.)

The question then becomes "why is the server
setting the ..callback_down flag?".

One possibility might be a timeout on a callback
attempt that is too agressive?
--> You should be able to see the callbacks in the
      packet trace.

Another might the server attempting the callbacks on
the wrong TCP connection.
--> The FreeBSD server was broken until recently and
      would use any TCP connection to the client and not
      just the one where the Session had enabled the
      backchannel.
      --> If you happen to be mounting a FreeBSD server,
             you cannot use "nconnect" unless it is very
            up-to-date (the fix was done 10months ago, but
             it takes quite a while to get out in releases).
Look for the CreateSession RPCs when the mount was
first done and see which ones have a backchannel.

Btw, unless the client establishes a new TCP connection
(SYN, SYN/ACK,...) before doing the BindConnectionToSession,
the server might reply NFS4ER_INVAL. The RFC says this is
to be done, but I'll admit the FreeBSD server doesn't bother.

Good luck with it, rick


________________________________________
From: J. Bruce Fields <bfields@fieldses.org>
Sent: Friday, January 7, 2022 12:17 PM
To: Daire Byrne
Cc: linux-nfs
Subject: Re: nconnect & repeating BIND_CONN_TO_SESSION?

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca


On Fri, Jan 07, 2022 at 12:26:07PM +0000, Daire Byrne wrote:
> Hi,
>
> I have been playing around with NFSv4.2 over very high latency
> networks (200ms - nocto,actimeo=3600,nconnect) and I noticed that
> lookups were much slower than expected.
>
> Looking at a normal stat, at first with simple workloads, I see the
> expected LOOKUP/ACCESS pairs for each directory and file in a path.
> But after some period of time and with extra load on the client host
> (I am also re-exporting these mounts but I don't think that's
> relevant), I start to see a BIND_CONN_TO_SESSION call for every
> nconnect connection before every LOOKUP & ACCESS. In the case of a
> high latency network, these extra roundtrips kill performance.
>
> I am using nconnect because it has some clear performance benefits
> when doing sequential reads and writes over high latency connections.
> If I use nconnect=16 then I end up with an extra 16
> BIND_CONN_TO_SESSION roundtrips before every operation. And once it
> gets into this state, there seems to be no way to stop it.
>
> Now this client is actually mounting ~22 servers all with nconnect and
> if I reduce the nconnect for all of them to "8" then I am less likely
> to see these repeating BIND_CONN_TO_SESSION calls (although I still
> see some). If I reduce the nconnect for each mount to 4, then I don't
> see the BIND_CONN_TO_SESSION appear (yet) with our workloads. So I'm
> wondering if there is some limit like the number of client mounts of
> unique server (22) times the total number of TCP connections to each?
> So in this case 22 servers x nconnect=8 = 176 client connections.

Hm, doesn't each of these use up a reserved port on the client by
default?  I forget the details of that.  Does "noresvport" help?

On the server (if Linux) there are maximums on the number of
connections.  It should be logging "too many open connections" if you're
hitting that.

--b.

> Or are there some sequence errors that trigger a BIND_CONN_TO_SESSION
> and increasing the number of nconnect threads increases the chances of
> triggering it? The remote servers are a mix of RHEL7 and RHEL8 and
> seem to show the same behaviour.
>
> I tried watching the rpcdebug stream but I'll admit I wasn't really
> sure what to look for. I see the same thing on a bunch of recent
> kernels (I've only tested from 5.12 upwards). This has probably been
> happening for our workloads for quite some time but it's only when the
> latency became so large that I noticed all these extra round trips.
>
> Any pointers as to why this might be happening?
>
> Cheers,
>
> Daire


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-07 17:59   ` Rick Macklem
@ 2022-01-07 18:56     ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2022-01-07 18:56 UTC (permalink / raw)
  To: Rick Macklem; +Cc: Daire Byrne, linux-nfs

On Fri, Jan 07, 2022 at 05:59:33PM +0000, Rick Macklem wrote:
> Hope you don't mind a top post...
> 
> If you capture packets, you will probably see a
> callback_down flag in the reply to Sequence for
> RPC(s) just before the BindConnectionToSession.
> (This is what normally triggers them.)

Hm, I'm pretty sure we *do* have a server bug or two in that area, so
that's a possibility.

--b.

> 
> The question then becomes "why is the server
> setting the ..callback_down flag?".
> 
> One possibility might be a timeout on a callback
> attempt that is too agressive?
> --> You should be able to see the callbacks in the
>       packet trace.
> 
> Another might the server attempting the callbacks on
> the wrong TCP connection.
> --> The FreeBSD server was broken until recently and
>       would use any TCP connection to the client and not
>       just the one where the Session had enabled the
>       backchannel.
>       --> If you happen to be mounting a FreeBSD server,
>              you cannot use "nconnect" unless it is very
>             up-to-date (the fix was done 10months ago, but
>              it takes quite a while to get out in releases).
> Look for the CreateSession RPCs when the mount was
> first done and see which ones have a backchannel.
> 
> Btw, unless the client establishes a new TCP connection
> (SYN, SYN/ACK,...) before doing the BindConnectionToSession,
> the server might reply NFS4ER_INVAL. The RFC says this is
> to be done, but I'll admit the FreeBSD server doesn't bother.
> 
> Good luck with it, rick
> 
> 
> ________________________________________
> From: J. Bruce Fields <bfields@fieldses.org>
> Sent: Friday, January 7, 2022 12:17 PM
> To: Daire Byrne
> Cc: linux-nfs
> Subject: Re: nconnect & repeating BIND_CONN_TO_SESSION?
> 
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
> 
> 
> On Fri, Jan 07, 2022 at 12:26:07PM +0000, Daire Byrne wrote:
> > Hi,
> >
> > I have been playing around with NFSv4.2 over very high latency
> > networks (200ms - nocto,actimeo=3600,nconnect) and I noticed that
> > lookups were much slower than expected.
> >
> > Looking at a normal stat, at first with simple workloads, I see the
> > expected LOOKUP/ACCESS pairs for each directory and file in a path.
> > But after some period of time and with extra load on the client host
> > (I am also re-exporting these mounts but I don't think that's
> > relevant), I start to see a BIND_CONN_TO_SESSION call for every
> > nconnect connection before every LOOKUP & ACCESS. In the case of a
> > high latency network, these extra roundtrips kill performance.
> >
> > I am using nconnect because it has some clear performance benefits
> > when doing sequential reads and writes over high latency connections.
> > If I use nconnect=16 then I end up with an extra 16
> > BIND_CONN_TO_SESSION roundtrips before every operation. And once it
> > gets into this state, there seems to be no way to stop it.
> >
> > Now this client is actually mounting ~22 servers all with nconnect and
> > if I reduce the nconnect for all of them to "8" then I am less likely
> > to see these repeating BIND_CONN_TO_SESSION calls (although I still
> > see some). If I reduce the nconnect for each mount to 4, then I don't
> > see the BIND_CONN_TO_SESSION appear (yet) with our workloads. So I'm
> > wondering if there is some limit like the number of client mounts of
> > unique server (22) times the total number of TCP connections to each?
> > So in this case 22 servers x nconnect=8 = 176 client connections.
> 
> Hm, doesn't each of these use up a reserved port on the client by
> default?  I forget the details of that.  Does "noresvport" help?
> 
> On the server (if Linux) there are maximums on the number of
> connections.  It should be logging "too many open connections" if you're
> hitting that.
> 
> --b.
> 
> > Or are there some sequence errors that trigger a BIND_CONN_TO_SESSION
> > and increasing the number of nconnect threads increases the chances of
> > triggering it? The remote servers are a mix of RHEL7 and RHEL8 and
> > seem to show the same behaviour.
> >
> > I tried watching the rpcdebug stream but I'll admit I wasn't really
> > sure what to look for. I see the same thing on a bunch of recent
> > kernels (I've only tested from 5.12 upwards). This has probably been
> > happening for our workloads for quite some time but it's only when the
> > latency became so large that I noticed all these extra round trips.
> >
> > Any pointers as to why this might be happening?
> >
> > Cheers,
> >
> > Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4
  2022-01-07 17:17 ` J. Bruce Fields
  2022-01-07 17:59   ` Rick Macklem
@ 2022-01-07 19:15   ` dai.ngo
  2022-01-07 19:41     ` Chuck Lever III
  2022-01-10  9:21   ` nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
  2 siblings, 1 reply; 15+ messages in thread
From: dai.ngo @ 2022-01-07 19:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Hi Bruce,

Commit428a23d2bf0ca 'nfsd: skip some unnecessary stats in the v4 case' causes 
these tests to fail when the filesystem is ext4. It works fine with 
btrfs and xfs. The reason it fails with EXT4 is because ext4 does not 
have i_version-supporting. The SB_I_VERSION is not set in the super 
block so we skip the fh_getattr and just use fh_post_attr which is 0 to 
fill fh_post_change. I'm not quite sure what's the fix for this. If we skip the fs_getattr
for v4 thenfh_post_attr is 0 which causes the returned change attribute to be 0. -Dai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4
  2022-01-07 19:15   ` pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4 dai.ngo
@ 2022-01-07 19:41     ` Chuck Lever III
  2022-01-07 20:01       ` dai.ngo
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2022-01-07 19:41 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Bruce Fields, Linux NFS Mailing List

Hi Dai-

> On Jan 7, 2022, at 2:15 PM, dai.ngo@oracle.com wrote:
> 
> Hi Bruce,
> 
> Commit428a23d2bf0ca 'nfsd: skip some unnecessary stats in the v4 case' causes these tests to fail when the filesystem is ext4. It works fine with btrfs and xfs. The reason it fails with EXT4 is because ext4 does not have i_version-supporting. The SB_I_VERSION is not set in the super block so we skip the fh_getattr and just use fh_post_attr which is 0 to fill fh_post_change. I'm not quite sure what's the fix for this. If we skip the fs_getattr
> for v4 thenfh_post_attr is 0 which causes the returned change attribute to be 0. -Dai

I've got a fix for this issue scheduled for v5.17, and hopefully
it can be back-ported to stable kernels too.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4
  2022-01-07 19:41     ` Chuck Lever III
@ 2022-01-07 20:01       ` dai.ngo
  0 siblings, 0 replies; 15+ messages in thread
From: dai.ngo @ 2022-01-07 20:01 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Bruce Fields, Linux NFS Mailing List


On 1/7/22 11:41 AM, Chuck Lever III wrote:
> Hi Dai-
>
>> On Jan 7, 2022, at 2:15 PM, dai.ngo@oracle.com wrote:
>>
>> Hi Bruce,
>>
>> Commit428a23d2bf0ca 'nfsd: skip some unnecessary stats in the v4 case' causes these tests to fail when the filesystem is ext4. It works fine with btrfs and xfs. The reason it fails with EXT4 is because ext4 does not have i_version-supporting. The SB_I_VERSION is not set in the super block so we skip the fh_getattr and just use fh_post_attr which is 0 to fill fh_post_change. I'm not quite sure what's the fix for this. If we skip the fs_getattr
>> for v4 thenfh_post_attr is 0 which causes the returned change attribute to be 0. -Dai
> I've got a fix for this issue scheduled for v5.17, and hopefully
> it can be back-ported to stable kernels too.

Thank you Chuck,

-Dai


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-07 17:17 ` J. Bruce Fields
  2022-01-07 17:59   ` Rick Macklem
  2022-01-07 19:15   ` pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4 dai.ngo
@ 2022-01-10  9:21   ` Daire Byrne
  2022-01-10 14:52     ` J. Bruce Fields
  2 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-01-10  9:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Fri, 7 Jan 2022 at 17:17, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> Hm, doesn't each of these use up a reserved port on the client by
> default?  I forget the details of that.  Does "noresvport" help?

Yes, I think this might be the issue. It seems like only 13/16
connections actually initially get setup at mount time and then it
tries to connect the full 16 once some activity to the mountpoint
starts. My guess is that we run out of reserved ports at that point
and continually trigger the BIND_CONN_TO_SESSION.

I can use noresvport with an NFSv3 client mount and it seems to do the
right thing (with the server exporting "insecure), but it doesn't seem
to have any effect on a NFSv4.2 mount (still uses ports <1024). Is
that expected? Perhaps NFSv4.2 doesn't allow "insecure" mounts?

Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-10  9:21   ` nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
@ 2022-01-10 14:52     ` J. Bruce Fields
  2022-01-10 17:21       ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2022-01-10 14:52 UTC (permalink / raw)
  To: Daire Byrne; +Cc: linux-nfs

On Mon, Jan 10, 2022 at 09:21:44AM +0000, Daire Byrne wrote:
> On Fri, 7 Jan 2022 at 17:17, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > Hm, doesn't each of these use up a reserved port on the client by
> > default?  I forget the details of that.  Does "noresvport" help?
> 
> Yes, I think this might be the issue. It seems like only 13/16
> connections actually initially get setup at mount time and then it
> tries to connect the full 16 once some activity to the mountpoint
> starts. My guess is that we run out of reserved ports at that point
> and continually trigger the BIND_CONN_TO_SESSION.
> 
> I can use noresvport with an NFSv3 client mount and it seems to do the
> right thing (with the server exporting "insecure), but it doesn't seem
> to have any effect on a NFSv4.2 mount (still uses ports <1024). Is
> that expected?

No.  Sounds like something's going wrong.

--b.

> Perhaps NFSv4.2 doesn't allow "insecure" mounts?
> 
> Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-10 14:52     ` J. Bruce Fields
@ 2022-01-10 17:21       ` J. Bruce Fields
  2022-01-23 21:56         ` Daire Byrne
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2022-01-10 17:21 UTC (permalink / raw)
  To: Daire Byrne; +Cc: linux-nfs

On Mon, Jan 10, 2022 at 09:52:10AM -0500, J. Bruce Fields wrote:
> On Mon, Jan 10, 2022 at 09:21:44AM +0000, Daire Byrne wrote:
> > On Fri, 7 Jan 2022 at 17:17, J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > Hm, doesn't each of these use up a reserved port on the client by
> > > default?  I forget the details of that.  Does "noresvport" help?
> > 
> > Yes, I think this might be the issue. It seems like only 13/16
> > connections actually initially get setup at mount time and then it
> > tries to connect the full 16 once some activity to the mountpoint
> > starts. My guess is that we run out of reserved ports at that point
> > and continually trigger the BIND_CONN_TO_SESSION.
> > 
> > I can use noresvport with an NFSv3 client mount and it seems to do the
> > right thing (with the server exporting "insecure), but it doesn't seem
> > to have any effect on a NFSv4.2 mount (still uses ports <1024). Is
> > that expected?
> 
> No.  Sounds like something's going wrong.

Looks to me like the mount option may just be getting lost on the way
down to the rpc client somehow, but I'm not quite sure how it's supposed
to work, and a naive attempt to copy what v3 is doing (below) wasn't
sufficient.

--b.

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index d8b5a250ca05..d0196dfb48a1 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -895,7 +895,8 @@ static int nfs4_set_client(struct nfs_server *server,
 		int proto, const struct rpc_timeout *timeparms,
 		u32 minorversion, unsigned int nconnect,
 		unsigned int max_connect,
-		struct net *net)
+		struct net *net,
+		bool noresvport)
 {
 	struct nfs_client_initdata cl_init = {
 		.hostname = hostname,
@@ -915,6 +916,8 @@ static int nfs4_set_client(struct nfs_server *server,
 		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
 	else
 		cl_init.max_connect = max_connect;
+	if (noresvport)
+		set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
 	if (proto == XPRT_TRANSPORT_TCP)
 		cl_init.nconnect = nconnect;
 
@@ -1156,7 +1159,8 @@ static int nfs4_init_server(struct nfs_server *server, struct fs_context *fc)
 				ctx->minorversion,
 				ctx->nfs_server.nconnect,
 				ctx->nfs_server.max_connect,
-				fc->net_ns);
+				fc->net_ns,
+				ctx->flags & NFS_MOUNT_NORESVPORT);
 	if (error < 0)
 		return error;
 
@@ -1246,7 +1250,8 @@ struct nfs_server *nfs4_create_referral_server(struct fs_context *fc)
 				parent_client->cl_mvops->minor_version,
 				parent_client->cl_nconnect,
 				parent_client->cl_max_connect,
-				parent_client->cl_net);
+				parent_client->cl_net,
+				ctx->flags & NFS_MOUNT_NORESVPORT);
 	if (!error)
 		goto init_server;
 #endif	/* IS_ENABLED(CONFIG_SUNRPC_XPRT_RDMA) */
@@ -1262,7 +1267,8 @@ struct nfs_server *nfs4_create_referral_server(struct fs_context *fc)
 				parent_client->cl_mvops->minor_version,
 				parent_client->cl_nconnect,
 				parent_client->cl_max_connect,
-				parent_client->cl_net);
+				parent_client->cl_net,
+				ctx->flags & NFS_MOUNT_NORESVPORT);
 	if (error < 0)
 		goto error;
 
@@ -1335,7 +1341,8 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,
 	error = nfs4_set_client(server, hostname, sap, salen, buf,
 				clp->cl_proto, clnt->cl_timeout,
 				clp->cl_minorversion,
-				clp->cl_nconnect, clp->cl_max_connect, net);
+				clp->cl_nconnect, clp->cl_max_connect, net,
+				test_bit(NFS_CS_NORESVPORT, &clp->cl_flags));
 	clear_bit(NFS_MIG_TSM_POSSIBLE, &server->mig_status);
 	if (error != 0) {
 		nfs_server_insert_lists(server);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-10 17:21       ` J. Bruce Fields
@ 2022-01-23 21:56         ` Daire Byrne
  2022-01-23 22:42           ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-01-23 21:56 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Mon, 10 Jan 2022 at 17:21, J. Bruce Fields <bfields@fieldses.org> wrote:
> Looks to me like the mount option may just be getting lost on the way
> down to the rpc client somehow, but I'm not quite sure how it's supposed
> to work, and a naive attempt to copy what v3 is doing (below) wasn't
> sufficient.
>
> --b.

Should I just open a bugzilla? It would be nice to mount more than 7
remote v4 servers with nconnect=16.

I did a very quick test of RHEL7 and RHEL8 client kernels and it seems
like they are doing the right thing with vers=4.2,noresvport.

I suspect it's just more recent kernels that has lost the ability to
use v4+noresvport (or newer nfs-utils?).

Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-23 21:56         ` Daire Byrne
@ 2022-01-23 22:42           ` J. Bruce Fields
  2022-01-24 12:33             ` Daire Byrne
  0 siblings, 1 reply; 15+ messages in thread
From: J. Bruce Fields @ 2022-01-23 22:42 UTC (permalink / raw)
  To: Daire Byrne; +Cc: linux-nfs

On Sun, Jan 23, 2022 at 09:56:19PM +0000, Daire Byrne wrote:
> On Mon, 10 Jan 2022 at 17:21, J. Bruce Fields <bfields@fieldses.org> wrote:
> > Looks to me like the mount option may just be getting lost on the way
> > down to the rpc client somehow, but I'm not quite sure how it's supposed
> > to work, and a naive attempt to copy what v3 is doing (below) wasn't
> > sufficient.
> >
> > --b.
> 
> Should I just open a bugzilla?

Worth a try.  It's probably nothing complicated, just a matter of time
to dig into it....

> It would be nice to mount more than 7
> remote v4 servers with nconnect=16.
> 
> I did a very quick test of RHEL7 and RHEL8 client kernels and it seems
> like they are doing the right thing with vers=4.2,noresvport.
> 
> I suspect it's just more recent kernels that has lost the ability to
> use v4+noresvport

Yes, thanks for checking that.  Let us know if you narrow down the
kernel any more.

> (or newer nfs-utils?).

Pretty sure nfs-utils is doing the right thing and passing down the
noresvport option:

# strace -f -etrace=mount mount -overs=4.2,resvport localhost:/ /mnt/
strace: Process 4088 attached
[pid  4088] mount("localhost:/", "/mnt", "nfs", 0, "vers=4.2,resvport,addr=127.0.0.1"...)

--b.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-23 22:42           ` J. Bruce Fields
@ 2022-01-24 12:33             ` Daire Byrne
  2022-02-07 15:21               ` Daire Byrne
  0 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-01-24 12:33 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Sun, 23 Jan 2022 at 22:42, J. Bruce Fields <bfields@fieldses.org> wrote:
> > I suspect it's just more recent kernels that has lost the ability to
> > use v4+noresvport
>
> Yes, thanks for checking that.  Let us know if you narrow down the
> kernel any more.

https://bugzilla.kernel.org/show_bug.cgi?id=215526

I think it stopped working somewhere between v5.11 and v5.12. I'll try
and bisect it this week.

Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-01-24 12:33             ` Daire Byrne
@ 2022-02-07 15:21               ` Daire Byrne
  2022-02-07 15:53                 ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-02-07 15:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Trond kindly posted a patch to fix the noresvport mount issue with
v4.2 and recent kernels.

I tested it quickly and verified ports greater than 1024 were being
used as expected, but it seems the same issue persists. It still feels
like it's related to the total number of server + nconnect pairings.

So I can have 20 servers mounted with nconnect=4 or 10 servers mounted
with nconnect=8 but any combination that increases the total
connection on the client past that and at least one of the servers
ends up in a state such that it's just sending a bind_conn_to_session
with every operation.

I'll see if I can discern anything from any packet capture (as
suggested earlier by Rick), but it's hard to reproduce exactly in time
and on demand. My theory is that maybe there is a timeout on the
callback and that adding more connections is just adding more
load/throughput and making a timeout more likely.

My workaround atm is to simply use NFSv3 instead of NFSv4 which might
be a better choice for this kind of workload anyway.

Daire

On Mon, 24 Jan 2022 at 12:33, Daire Byrne <daire@dneg.com> wrote:
>
> On Sun, 23 Jan 2022 at 22:42, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > I suspect it's just more recent kernels that has lost the ability to
> > > use v4+noresvport
> >
> > Yes, thanks for checking that.  Let us know if you narrow down the
> > kernel any more.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=215526
>
> I think it stopped working somewhere between v5.11 and v5.12. I'll try
> and bisect it this week.
>
> Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: nconnect & repeating BIND_CONN_TO_SESSION?
  2022-02-07 15:21               ` Daire Byrne
@ 2022-02-07 15:53                 ` J. Bruce Fields
  0 siblings, 0 replies; 15+ messages in thread
From: J. Bruce Fields @ 2022-02-07 15:53 UTC (permalink / raw)
  To: Daire Byrne; +Cc: linux-nfs

The server enforces a limit on the total number of connections in
net/sunrpc/svc.c:svc_check_conn_limits().  Maybe that's what you're
hitting.

By default it's (number of threads + 3) * 20.  You can bump the number
of nfsd threads or change /proc/fs/nfsd/max_connections.

Weird that your limit would be 80, though, which is the number you'd
expect if the server was running with just one thread.

The only other rpc server I can think of that's involved here is the NFS
client's callback server, which does have only one thread, but
nfs_callback_create_svc() does:

	/* As there is only one thread we need to over-ride the default
	 * maximum of 80 connections
	 */
	serv->sv_maxconn = 1024;

and has since the beginning.  I can't see why that wouldn't work.  If
80's really your limit, though, that seems like an odd coincidence.
Have you seen that "too many connections" warning in the client logs?

--b.

On Mon, Feb 07, 2022 at 03:21:41PM +0000, Daire Byrne wrote:
> Trond kindly posted a patch to fix the noresvport mount issue with
> v4.2 and recent kernels.
> 
> I tested it quickly and verified ports greater than 1024 were being
> used as expected, but it seems the same issue persists. It still feels
> like it's related to the total number of server + nconnect pairings.
> 
> So I can have 20 servers mounted with nconnect=4 or 10 servers mounted
> with nconnect=8 but any combination that increases the total
> connection on the client past that and at least one of the servers
> ends up in a state such that it's just sending a bind_conn_to_session
> with every operation.
> 
> I'll see if I can discern anything from any packet capture (as
> suggested earlier by Rick), but it's hard to reproduce exactly in time
> and on demand. My theory is that maybe there is a timeout on the
> callback and that adding more connections is just adding more
> load/throughput and making a timeout more likely.
> 
> My workaround atm is to simply use NFSv3 instead of NFSv4 which might
> be a better choice for this kind of workload anyway.
> 
> Daire
> 
> 
> On Mon, 24 Jan 2022 at 12:33, Daire Byrne <daire@dneg.com> wrote:
> >
> > On Sun, 23 Jan 2022 at 22:42, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > I suspect it's just more recent kernels that has lost the ability to
> > > > use v4+noresvport
> > >
> > > Yes, thanks for checking that.  Let us know if you narrow down the
> > > kernel any more.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=215526
> >
> > I think it stopped working somewhere between v5.11 and v5.12. I'll try
> > and bisect it this week.
> >
> > Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-07 15:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-07 12:26 nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
2022-01-07 17:17 ` J. Bruce Fields
2022-01-07 17:59   ` Rick Macklem
2022-01-07 18:56     ` J. Bruce Fields
2022-01-07 19:15   ` pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4 dai.ngo
2022-01-07 19:41     ` Chuck Lever III
2022-01-07 20:01       ` dai.ngo
2022-01-10  9:21   ` nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
2022-01-10 14:52     ` J. Bruce Fields
2022-01-10 17:21       ` J. Bruce Fields
2022-01-23 21:56         ` Daire Byrne
2022-01-23 22:42           ` J. Bruce Fields
2022-01-24 12:33             ` Daire Byrne
2022-02-07 15:21               ` Daire Byrne
2022-02-07 15:53                 ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.