From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: NeilBrown <neil@brown.name>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: NFS client low performance in concurrent environment.
Date: Sat, 5 Apr 2025 19:10:51 +0200 (CEST) [thread overview]
Message-ID: <1266597584.24052706.1743873051468.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <174373648629.9342.17081599824511256253@noble.neil.brown.name>
[-- Attachment #1: Type: text/plain, Size: 5898 bytes --]
Hi NeilBrown,
The behavior you describe in the patch series matches our observations.
I briefly went through the patches. Unfortunately, my kernel skills are
not so strong that I can follow changes. Obviously, it's up to the filesystem
implementation to handle parallel creations or require a high-level locking
to ensure integrity. This is what the FS_PAR_DIR_UPDATE flag is doing, right?
From the comments on the thread, I got the impression that the implementation is overcomplicated.
I can build and test the changes. However, the series won't be accepted as-is.
Best regards,
Tigran.
----- Original Message -----
> From: "NeilBrown" <neil@brown.name>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Friday, 4 April, 2025 05:14:46
> Subject: Re: NFS client low performance in concurrent environment.
> On Fri, 04 Apr 2025, Mkrtchyan, Tigran wrote:
>> Dear NFS fellows,
>>
>> As part of research, we have adopted a well-known in the HPC community, IOR[1],
>> to support libnfs[2]. After running a bunch of tests, our observation is that
>> the
>> multiple clients in userspace have a higher throughput than the in-kernel
>> client (or server).
>>
>> In the test below, nfs server runs on RHEL9 with kernel
>> 5.14.0-503.23.1.el9_5.x86_64
>> exporting /mnt. The results are in operations per second, thus, higher numbers
>> are better.
>>
>> The client is an 80-core single host, running RHEL9 with kernel
>> 5.14.0-427.26.1.el9_4.x86_64.
>> We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte
>> files.
>>
>>
>> TEST 1: libnfs
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS
>> --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0
>> -b 0 -F -d /test
>> -- started at 04/03/2025 14:39:30 --
>>
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Command line used: ./mdtest '-a' 'LIBNFS'
>> '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z'
>> '0' '-b' '0' '-F' '-d' '/test'
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path : /test
>> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>>
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>> Operation Max Min Mean Std Dev
>> --------- --- --- ---- -------
>> File creation 7147.432 6789.531 6996.044
>> 132.149
>> File stat 97175.603 57844.142 91063.340
>> 12000.718
>> File read 97004.685 48234.620 89099.077
>> 14715.699
>> File removal 25172.919 23405.880 24424.384
>> 577.264
>> Tree creation 2375.031 555.537 1982.139
>> 561.013
>> Tree removal 99.443 95.475 97.632
>> 1.266
>> -- finished at 04/03/2025 14:40:05 --
>> ```
>>
>>
>> TEST 2: in-kernel client
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0
>> -F -d /mnt/test
>> -- started at 04/03/2025 14:36:09 --
>>
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path : /mnt/test
>> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>>
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>> Operation Max Min Mean Std Dev
>> --------- --- --- ---- -------
>> File creation 2301.914 2046.406 2203.859
>> 88.793
>> File stat 101396.240 77386.014 91270.677
>> 6229.657
>> File read 43631.081 36858.229 40800.066
>> 2534.255
>> File removal 3102.328 2647.649 2840.170
>> 153.959
>> Tree creation 2142.137 253.739 1710.416
>> 620.293
>> Tree removal 42.922 25.670 36.604
>> 4.820
>> -- finished at 04/03/2025 14:38:28 --
>> ```
>>
>>
>> Obviously, the kernel client shares the TCP connection. So, either (a) this is
>> an expected behavior;
>> (b) client thread starvation; and (c) server thread starvation. The last option
>> is unlikely, as we
>> first observed the behavior with the dCache NFS server implementation before
>> falling back to
>> the linux kernel nfsd.
>
> If you think "kernel client share the TCP connection" then it would be
> worth adding the "nconnect=8" option to see if that makes a difference.
>
> If all these file operations are happening in the one directory then the
> problem is probably contention on the directory lock. The Linux VFS
> holds an exclusive lock on the directory while creating or removing any
> files in that directory. If you can shard the operations over multiple
> directories you can ease the contention.
>
> I am working on removing the dependency on the directory lock, but I
> don't have a patch for you to try - unless you are happy to work on a
> three-year old kernel
> There is a patch set here:
> https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/
> which should work on a kernel of that time.
>
> NeilBrown
>
>>
>> Best regards,
>> Tigran.
>>
>>
>> [1]: https://github.com/hpc/ior
>> [2]: https://github.com/sahlberg/libnfs
>>
>> -----------------------------
>> DESY-IT, Scientific Computing
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]
prev parent reply other threads:[~2025-04-05 17:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 13:01 NFS client low performance in concurrent environment Mkrtchyan, Tigran
2025-04-04 3:14 ` NeilBrown
2025-04-05 17:10 ` Mkrtchyan, Tigran [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1266597584.24052706.1743873051468.JavaMail.zimbra@desy.de \
--to=tigran.mkrtchyan@desy.de \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox