All of lore.kernel.org
 help / color / mirror / Atom feed
* nconnect & repeating BIND_CONN_TO_SESSION?
@ 2022-01-07 12:26 Daire Byrne
  2022-01-07 17:17 ` J. Bruce Fields
  0 siblings, 1 reply; 15+ messages in thread
From: Daire Byrne @ 2022-01-07 12:26 UTC (permalink / raw)
  To: linux-nfs

Hi,

I have been playing around with NFSv4.2 over very high latency
networks (200ms - nocto,actimeo=3600,nconnect) and I noticed that
lookups were much slower than expected.

Looking at a normal stat, at first with simple workloads, I see the
expected LOOKUP/ACCESS pairs for each directory and file in a path.
But after some period of time and with extra load on the client host
(I am also re-exporting these mounts but I don't think that's
relevant), I start to see a BIND_CONN_TO_SESSION call for every
nconnect connection before every LOOKUP & ACCESS. In the case of a
high latency network, these extra roundtrips kill performance.

I am using nconnect because it has some clear performance benefits
when doing sequential reads and writes over high latency connections.
If I use nconnect=16 then I end up with an extra 16
BIND_CONN_TO_SESSION roundtrips before every operation. And once it
gets into this state, there seems to be no way to stop it.

Now this client is actually mounting ~22 servers all with nconnect and
if I reduce the nconnect for all of them to "8" then I am less likely
to see these repeating BIND_CONN_TO_SESSION calls (although I still
see some). If I reduce the nconnect for each mount to 4, then I don't
see the BIND_CONN_TO_SESSION appear (yet) with our workloads. So I'm
wondering if there is some limit like the number of client mounts of
unique server (22) times the total number of TCP connections to each?
So in this case 22 servers x nconnect=8 = 176 client connections.

Or are there some sequence errors that trigger a BIND_CONN_TO_SESSION
and increasing the number of nconnect threads increases the chances of
triggering it? The remote servers are a mix of RHEL7 and RHEL8 and
seem to show the same behaviour.

I tried watching the rpcdebug stream but I'll admit I wasn't really
sure what to look for. I see the same thing on a bunch of recent
kernels (I've only tested from 5.12 upwards). This has probably been
happening for our workloads for quite some time but it's only when the
latency became so large that I noticed all these extra round trips.

Any pointers as to why this might be happening?

Cheers,

Daire

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-07 15:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-07 12:26 nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
2022-01-07 17:17 ` J. Bruce Fields
2022-01-07 17:59   ` Rick Macklem
2022-01-07 18:56     ` J. Bruce Fields
2022-01-07 19:15   ` pynfs regression with nfs4.1 RNM18, RNM19 and RNM20 with ext4 dai.ngo
2022-01-07 19:41     ` Chuck Lever III
2022-01-07 20:01       ` dai.ngo
2022-01-10  9:21   ` nconnect & repeating BIND_CONN_TO_SESSION? Daire Byrne
2022-01-10 14:52     ` J. Bruce Fields
2022-01-10 17:21       ` J. Bruce Fields
2022-01-23 21:56         ` Daire Byrne
2022-01-23 22:42           ` J. Bruce Fields
2022-01-24 12:33             ` Daire Byrne
2022-02-07 15:21               ` Daire Byrne
2022-02-07 15:53                 ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.