* NFS client low performance in concurrent environment.
@ 2025-04-03 13:01 Mkrtchyan, Tigran
2025-04-04 3:14 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Mkrtchyan, Tigran @ 2025-04-03 13:01 UTC (permalink / raw)
To: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 3865 bytes --]
Dear NFS fellows,
As part of research, we have adopted a well-known in the HPC community, IOR[1],
to support libnfs[2]. After running a bunch of tests, our observation is that the
multiple clients in userspace have a higher throughput than the in-kernel
client (or server).
In the test below, nfs server runs on RHEL9 with kernel 5.14.0-503.23.1.el9_5.x86_64
exporting /mnt. The results are in operations per second, thus, higher numbers are better.
The client is an 80-core single host, running RHEL9 with kernel 5.14.0-427.26.1.el9_4.x86_64.
We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte files.
TEST 1: libnfs
```
$ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /test
-- started at 04/03/2025 14:39:30 --
mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
Command line used: ./mdtest '-a' 'LIBNFS' '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z' '0' '-b' '0' '-F' '-d' '/test'
Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Path : /test
FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8%
128 tasks, 16384 files
SUMMARY rate (in ops/sec): (of 10 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation 7147.432 6789.531 6996.044 132.149
File stat 97175.603 57844.142 91063.340 12000.718
File read 97004.685 48234.620 89099.077 14715.699
File removal 25172.919 23405.880 24424.384 577.264
Tree creation 2375.031 555.537 1982.139 561.013
Tree removal 99.443 95.475 97.632 1.266
-- finished at 04/03/2025 14:40:05 --
```
TEST 2: in-kernel client
```
$ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /mnt/test
-- started at 04/03/2025 14:36:09 --
mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Path : /mnt/test
FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8%
128 tasks, 16384 files
SUMMARY rate (in ops/sec): (of 10 iterations)
Operation Max Min Mean Std Dev
--------- --- --- ---- -------
File creation 2301.914 2046.406 2203.859 88.793
File stat 101396.240 77386.014 91270.677 6229.657
File read 43631.081 36858.229 40800.066 2534.255
File removal 3102.328 2647.649 2840.170 153.959
Tree creation 2142.137 253.739 1710.416 620.293
Tree removal 42.922 25.670 36.604 4.820
-- finished at 04/03/2025 14:38:28 --
```
Obviously, the kernel client shares the TCP connection. So, either (a) this is an expected behavior;
(b) client thread starvation; and (c) server thread starvation. The last option is unlikely, as we
first observed the behavior with the dCache NFS server implementation before falling back to
the linux kernel nfsd.
Best regards,
Tigran.
[1]: https://github.com/hpc/ior
[2]: https://github.com/sahlberg/libnfs
-----------------------------
DESY-IT, Scientific Computing
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: NFS client low performance in concurrent environment.
2025-04-03 13:01 NFS client low performance in concurrent environment Mkrtchyan, Tigran
@ 2025-04-04 3:14 ` NeilBrown
2025-04-05 17:10 ` Mkrtchyan, Tigran
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2025-04-04 3:14 UTC (permalink / raw)
To: Mkrtchyan, Tigran; +Cc: linux-nfs
On Fri, 04 Apr 2025, Mkrtchyan, Tigran wrote:
> Dear NFS fellows,
>
> As part of research, we have adopted a well-known in the HPC community, IOR[1],
> to support libnfs[2]. After running a bunch of tests, our observation is that the
> multiple clients in userspace have a higher throughput than the in-kernel
> client (or server).
>
> In the test below, nfs server runs on RHEL9 with kernel 5.14.0-503.23.1.el9_5.x86_64
> exporting /mnt. The results are in operations per second, thus, higher numbers are better.
>
> The client is an 80-core single host, running RHEL9 with kernel 5.14.0-427.26.1.el9_4.x86_64.
> We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte files.
>
>
> TEST 1: libnfs
> ```
> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /test
> -- started at 04/03/2025 14:39:30 --
>
> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
> Command line used: ./mdtest '-a' 'LIBNFS' '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z' '0' '-b' '0' '-F' '-d' '/test'
> Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
> Path : /test
> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8%
> 128 tasks, 16384 files
>
> SUMMARY rate (in ops/sec): (of 10 iterations)
> Operation Max Min Mean Std Dev
> --------- --- --- ---- -------
> File creation 7147.432 6789.531 6996.044 132.149
> File stat 97175.603 57844.142 91063.340 12000.718
> File read 97004.685 48234.620 89099.077 14715.699
> File removal 25172.919 23405.880 24424.384 577.264
> Tree creation 2375.031 555.537 1982.139 561.013
> Tree removal 99.443 95.475 97.632 1.266
> -- finished at 04/03/2025 14:40:05 --
> ```
>
>
> TEST 2: in-kernel client
> ```
> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /mnt/test
> -- started at 04/03/2025 14:36:09 --
>
> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
> Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
> Path : /mnt/test
> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8%
> 128 tasks, 16384 files
>
> SUMMARY rate (in ops/sec): (of 10 iterations)
> Operation Max Min Mean Std Dev
> --------- --- --- ---- -------
> File creation 2301.914 2046.406 2203.859 88.793
> File stat 101396.240 77386.014 91270.677 6229.657
> File read 43631.081 36858.229 40800.066 2534.255
> File removal 3102.328 2647.649 2840.170 153.959
> Tree creation 2142.137 253.739 1710.416 620.293
> Tree removal 42.922 25.670 36.604 4.820
> -- finished at 04/03/2025 14:38:28 --
> ```
>
>
> Obviously, the kernel client shares the TCP connection. So, either (a) this is an expected behavior;
> (b) client thread starvation; and (c) server thread starvation. The last option is unlikely, as we
> first observed the behavior with the dCache NFS server implementation before falling back to
> the linux kernel nfsd.
If you think "kernel client share the TCP connection" then it would be
worth adding the "nconnect=8" option to see if that makes a difference.
If all these file operations are happening in the one directory then the
problem is probably contention on the directory lock. The Linux VFS
holds an exclusive lock on the directory while creating or removing any
files in that directory. If you can shard the operations over multiple
directories you can ease the contention.
I am working on removing the dependency on the directory lock, but I
don't have a patch for you to try - unless you are happy to work on a
three-year old kernel
There is a patch set here:
https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/
which should work on a kernel of that time.
NeilBrown
>
> Best regards,
> Tigran.
>
>
> [1]: https://github.com/hpc/ior
> [2]: https://github.com/sahlberg/libnfs
>
> -----------------------------
> DESY-IT, Scientific Computing
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: NFS client low performance in concurrent environment.
2025-04-04 3:14 ` NeilBrown
@ 2025-04-05 17:10 ` Mkrtchyan, Tigran
0 siblings, 0 replies; 3+ messages in thread
From: Mkrtchyan, Tigran @ 2025-04-05 17:10 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 5898 bytes --]
Hi NeilBrown,
The behavior you describe in the patch series matches our observations.
I briefly went through the patches. Unfortunately, my kernel skills are
not so strong that I can follow changes. Obviously, it's up to the filesystem
implementation to handle parallel creations or require a high-level locking
to ensure integrity. This is what the FS_PAR_DIR_UPDATE flag is doing, right?
From the comments on the thread, I got the impression that the implementation is overcomplicated.
I can build and test the changes. However, the series won't be accepted as-is.
Best regards,
Tigran.
----- Original Message -----
> From: "NeilBrown" <neil@brown.name>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Friday, 4 April, 2025 05:14:46
> Subject: Re: NFS client low performance in concurrent environment.
> On Fri, 04 Apr 2025, Mkrtchyan, Tigran wrote:
>> Dear NFS fellows,
>>
>> As part of research, we have adopted a well-known in the HPC community, IOR[1],
>> to support libnfs[2]. After running a bunch of tests, our observation is that
>> the
>> multiple clients in userspace have a higher throughput than the in-kernel
>> client (or server).
>>
>> In the test below, nfs server runs on RHEL9 with kernel
>> 5.14.0-503.23.1.el9_5.x86_64
>> exporting /mnt. The results are in operations per second, thus, higher numbers
>> are better.
>>
>> The client is an 80-core single host, running RHEL9 with kernel
>> 5.14.0-427.26.1.el9_4.x86_64.
>> We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte
>> files.
>>
>>
>> TEST 1: libnfs
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS
>> --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0
>> -b 0 -F -d /test
>> -- started at 04/03/2025 14:39:30 --
>>
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Command line used: ./mdtest '-a' 'LIBNFS'
>> '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z'
>> '0' '-b' '0' '-F' '-d' '/test'
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path : /test
>> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>>
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>> Operation Max Min Mean Std Dev
>> --------- --- --- ---- -------
>> File creation 7147.432 6789.531 6996.044
>> 132.149
>> File stat 97175.603 57844.142 91063.340
>> 12000.718
>> File read 97004.685 48234.620 89099.077
>> 14715.699
>> File removal 25172.919 23405.880 24424.384
>> 577.264
>> Tree creation 2375.031 555.537 1982.139
>> 561.013
>> Tree removal 99.443 95.475 97.632
>> 1.266
>> -- finished at 04/03/2025 14:40:05 --
>> ```
>>
>>
>> TEST 2: in-kernel client
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0
>> -F -d /mnt/test
>> -- started at 04/03/2025 14:36:09 --
>>
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path : /mnt/test
>> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>>
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>> Operation Max Min Mean Std Dev
>> --------- --- --- ---- -------
>> File creation 2301.914 2046.406 2203.859
>> 88.793
>> File stat 101396.240 77386.014 91270.677
>> 6229.657
>> File read 43631.081 36858.229 40800.066
>> 2534.255
>> File removal 3102.328 2647.649 2840.170
>> 153.959
>> Tree creation 2142.137 253.739 1710.416
>> 620.293
>> Tree removal 42.922 25.670 36.604
>> 4.820
>> -- finished at 04/03/2025 14:38:28 --
>> ```
>>
>>
>> Obviously, the kernel client shares the TCP connection. So, either (a) this is
>> an expected behavior;
>> (b) client thread starvation; and (c) server thread starvation. The last option
>> is unlikely, as we
>> first observed the behavior with the dCache NFS server implementation before
>> falling back to
>> the linux kernel nfsd.
>
> If you think "kernel client share the TCP connection" then it would be
> worth adding the "nconnect=8" option to see if that makes a difference.
>
> If all these file operations are happening in the one directory then the
> problem is probably contention on the directory lock. The Linux VFS
> holds an exclusive lock on the directory while creating or removing any
> files in that directory. If you can shard the operations over multiple
> directories you can ease the contention.
>
> I am working on removing the dependency on the directory lock, but I
> don't have a patch for you to try - unless you are happy to work on a
> three-year old kernel
> There is a patch set here:
> https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/
> which should work on a kernel of that time.
>
> NeilBrown
>
>>
>> Best regards,
>> Tigran.
>>
>>
>> [1]: https://github.com/hpc/ior
>> [2]: https://github.com/sahlberg/libnfs
>>
>> -----------------------------
>> DESY-IT, Scientific Computing
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-05 17:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-03 13:01 NFS client low performance in concurrent environment Mkrtchyan, Tigran
2025-04-04 3:14 ` NeilBrown
2025-04-05 17:10 ` Mkrtchyan, Tigran
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox