Linux NFS development
 help / color / mirror / Atom feed
From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: NeilBrown <neil@brown.name>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: NFS client low performance in concurrent environment.
Date: Sat, 5 Apr 2025 19:10:51 +0200 (CEST)	[thread overview]
Message-ID: <1266597584.24052706.1743873051468.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <174373648629.9342.17081599824511256253@noble.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 5898 bytes --]


Hi NeilBrown,

The behavior you describe in the patch series matches our observations.
I briefly went through the patches. Unfortunately, my kernel skills are
not so strong that I can follow changes. Obviously, it's up to the filesystem
implementation to handle parallel creations or require a high-level locking
to ensure integrity. This is what the FS_PAR_DIR_UPDATE flag is doing, right?

From the comments on the thread, I got the impression that the implementation is overcomplicated.
I can build and test the changes. However, the series won't be accepted as-is.

Best regards,
   Tigran.

----- Original Message -----
> From: "NeilBrown" <neil@brown.name>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Friday, 4 April, 2025 05:14:46
> Subject: Re: NFS client low performance in concurrent environment.

> On Fri, 04 Apr 2025, Mkrtchyan, Tigran wrote:
>> Dear NFS fellows,
>> 
>> As part of research, we have adopted a well-known in the HPC community, IOR[1],
>> to support libnfs[2]. After running a bunch of tests, our observation is that
>> the
>> multiple clients in userspace have a higher throughput than the in-kernel
>> client (or server).
>> 
>> In the test below, nfs server runs on RHEL9 with kernel
>> 5.14.0-503.23.1.el9_5.x86_64
>> exporting /mnt. The results are in operations per second, thus, higher numbers
>> are better.
>> 
>> The client is an 80-core single host, running RHEL9 with kernel
>> 5.14.0-427.26.1.el9_4.x86_64.
>> We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte
>> files.
>> 
>> 
>> TEST 1: libnfs
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE  ./mdtest  -a LIBNFS
>> --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0
>> -b 0 -F -d /test
>> -- started at 04/03/2025 14:39:30 --
>> 
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Command line used: ./mdtest '-a' 'LIBNFS'
>> '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z'
>> '0' '-b' '0' '-F' '-d' '/test'
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path                : /test
>> FS                  : 38.2 GiB   Used FS: 41.3%   Inodes: 2.4 Mi   Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>> 
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>>    Operation                     Max            Min           Mean        Std Dev
>>    ---------                     ---            ---           ----        -------
>>    File creation                7147.432       6789.531       6996.044
>>    132.149
>>    File stat                   97175.603      57844.142      91063.340
>>    12000.718
>>    File read                   97004.685      48234.620      89099.077
>>    14715.699
>>    File removal                25172.919      23405.880      24424.384
>>    577.264
>>    Tree creation                2375.031        555.537       1982.139
>>    561.013
>>    Tree removal                   99.443         95.475         97.632
>>    1.266
>> -- finished at 04/03/2025 14:40:05 --
>> ```
>> 
>> 
>> TEST 2: in-kernel client
>> ```
>> $ mpirun -n 128 --map-by :OVERSUBSCRIBE  ./mdtest  -w 0 -I 128 -i 10 -z 0 -b 0
>> -F -d /mnt/test
>> -- started at 04/03/2025 14:36:09 --
>> 
>> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s)
>> Nodemap:
>> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
>> Path                : /mnt/test
>> FS                  : 38.2 GiB   Used FS: 41.3%   Inodes: 2.4 Mi   Used Inodes:
>> 5.8%
>> 128 tasks, 16384 files
>> 
>> SUMMARY rate (in ops/sec): (of 10 iterations)
>>    Operation                     Max            Min           Mean        Std Dev
>>    ---------                     ---            ---           ----        -------
>>    File creation                2301.914       2046.406       2203.859
>>    88.793
>>    File stat                  101396.240      77386.014      91270.677
>>    6229.657
>>    File read                   43631.081      36858.229      40800.066
>>    2534.255
>>    File removal                 3102.328       2647.649       2840.170
>>    153.959
>>    Tree creation                2142.137        253.739       1710.416
>>    620.293
>>    Tree removal                   42.922         25.670         36.604
>>    4.820
>> -- finished at 04/03/2025 14:38:28 --
>> ```
>> 
>> 
>> Obviously, the kernel client shares the TCP connection. So, either (a) this is
>> an expected behavior;
>> (b) client thread starvation; and (c) server thread starvation. The last option
>> is unlikely, as we
>> first observed the behavior with the dCache NFS server implementation before
>> falling back to
>> the linux kernel nfsd.
> 
> If you think "kernel client share the TCP connection" then it would be
> worth adding the "nconnect=8" option to see if that makes a difference.
> 
> If all these file operations are happening in the one directory then the
> problem is probably contention on the directory lock.  The Linux VFS
> holds an exclusive lock on the directory while creating or removing any
> files in that directory.  If you can shard the operations over multiple
> directories you can ease the contention.
> 
> I am working on removing the dependency on the directory lock, but I
> don't have a patch for you to try - unless you are happy to work on a
> three-year old kernel
> There is a patch set here:
>   https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/
> which should work on a kernel of that time.
> 
> NeilBrown
> 
>> 
>> Best regards,
>>    Tigran.
>> 
>> 
>> [1]: https://github.com/hpc/ior
>> [2]: https://github.com/sahlberg/libnfs
>> 
>> -----------------------------
>> DESY-IT, Scientific Computing

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]

      reply	other threads:[~2025-04-05 17:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 13:01 NFS client low performance in concurrent environment Mkrtchyan, Tigran
2025-04-04  3:14 ` NeilBrown
2025-04-05 17:10   ` Mkrtchyan, Tigran [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1266597584.24052706.1743873051468.JavaMail.zimbra@desy.de \
    --to=tigran.mkrtchyan@desy.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox