Linux NFS development
 help / color / mirror / Atom feed
From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: Jeff Layton <jlayton@kernel.org>
Cc: Piyush Sachdeva <s.piyush1024@gmail.com>,
	 linux-nfs <linux-nfs@vger.kernel.org>,
	Chuck Lever <cel@kernel.org>,  trondmy <trondmy@kernel.org>,
	sfrench@samba.org,  sprasad@microsoft.com,
	vaibsharma@microsoft.com
Subject: Re: NFS delegations behavior analysis
Date: Tue, 23 Jun 2026 13:04:31 +0200 (CEST)	[thread overview]
Message-ID: <455619640.1622514.1782212671358.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <0b39c1e01a92f99fe456c76523ec7f3aa5dc1a81.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5344 bytes --]



----- Original Message -----
> From: "Jeff Layton" <jlayton@kernel.org>
> To: "Piyush Sachdeva" <s.piyush1024@gmail.com>, "linux-nfs" <linux-nfs@vger.kernel.org>, "Chuck Lever" <cel@kernel.org>,
> "trondmy" <trondmy@kernel.org>, sfrench@samba.org, sprasad@microsoft.com
> Cc: vaibsharma@microsoft.com
> Sent: Tuesday, 23 June, 2026 12:50:16
> Subject: Re: NFS delegations behavior analysis

> On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
>> Hi,
>> Lately I have been running micro benchmarks around the `ls` command and
>> reading through the code documentation of the NFS client to better
>> understand the client side caching behavior with and without
>> delegations.
>> 
>> Understanding so far:
>> Delegations (both file and directory) are granted by the server to the
>> client, indefinitely (until revoked or under the watermark) to cache
>> attributes. The caching of data is a result of the attribute
>> cache. Hence forth, a directory delegation will cache the directory
>> attributes and the names of the files in the directory, and a file
>> delegation will cache the attributes of the file and the file data.
>> 
>> Workload run:
>> I focused on the 2 workloads below, doing 2 passes of a large flat
>> directory (with close to 100K files) -
>> a cold pass, and warm pass using the cache from the cold pass:
>> - lslr - ls -lR on both runs
>> - lsmix - ls -R (cold) and then ls -lR (warm)
>> 
>> I also played with the rdirplus behavior using both the default
>> heuristic behavior and the `rdirplus=force` set at mount time.
>> 
>> Numbers:
>> actimeo=5s, rdirplus=force, ACLs off, flat_dir
>> ==================================================================
>> 
>>                  |         LSLR          |         LSMIX
>>                  |  (ls -lR cold / warm) |  (p1 ls -R / p2 ls -lR)
>> Operation        |  flat cold  | flat warm |   flat p1   | flat p2
>> -----------------+-------------+-----------+-------------+---------
>> READDIR calls    |    27       |     0     |   27        |    0
>> READDIR recv B   | 23,603,024  |     0     | 23,603,024  |    0
>>    call type     | readdirplus |    --     | readdirplus |    --
>> LOOKUP           |     1       |     0     |    1        |    0
>> GETATTR          |     3       |  100,000  |    2        | 100,001
>> ACCESS           |     2       |     0     |    2        |    0
>> -----------------+-------------+-----------+-------------+---------
>> Elapsed (age)    |  ~14 s      |  ~62 s    |   ~16 s     |  ~63 s
>> 
>> 
>> Observations:
>> When doing `ls` or `ls -l` on a directory, due to the open(2) on the
>> directory, the client gets a directory delegation - caching the
>> directory attributes and file names. However, as we don't have file
>> delegations due to no open(2) calls to any of the files. Henceforth,
>> the cache of file attributes is governed by `actimeo`.
>> Now here is the interesting bit, if the next `ls -l` is issued after
>> the `actimeo`, a massive GETATTR storm hits the server, doing stat()
>> calls for every file in the directory. As a result, the performance of
>> this warm `ls -l` run ends up being worse than the cold pass. I am
>> guessing this is most likely due to the compounded "rdirplus" being more
>> efficient than stat() calls.
>> 
>> 
>> Proposal:
>> For large directories, this ends up being a massive problem, taking 1-2
>> minutes when enumerating a directory on the warm passes.
>> - An easier way to tackle this could be to do a rdirplus=[auto | forced]
>>   instead of issuing the stat(2) storm to the server: When the client
>>   notices that there are cache misses, which would be the case of file
>>   attributes, instead of fetching file names from the directory-delegation
>>   cache and attributes from GETATTR, the client does a READDIRPLUS to
>>   the server, nonetheless.
>> - A more tedious would be the to cache file attributes as well, as a part
>>   of the directory delegation. This would end up requiring a change in the
>>   NFS protocol spec though.
>> - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
>>   what if, the client could do 1 GETATTR call for getting attributes
>>   for multiple files.
> 
> 
> ls is such a hard workload to get right, because we don't really get an

100% agree. And there were a couple of attempts to address this issue
(second ls that is slow).

> indication in the kernel of what userland's intentions are. It's
> basically a readdir() call followed by a bunch of stat()'s, but at the
> point where we're getting the readdir() call, we don't know if userland
> intends to stat() those files or not. We have to make a guess about
> that intention.
> 
> In this case, it sounds like the directory cache was valid, so the
> client decided it didn't need to do a READDIR at all, but the
> individual files had caches that timed out.
> 
> So imagine you're the kernel client and have been given that second
> readdir() call: Why should you decide to do a READDIRPLUS at that point
> instead of a regular READDIR?

May we need some kind of client-side heuristics, like on the server side
for open-delegations, where after seeing some `stats` for files in the
In the same directory, the client will decide to switch to READDIR (v4)
to get all attributes in one go.

Best regards,
   Tigran.

> --
> Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2309 bytes --]

  reply	other threads:[~2026-06-23 11:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 10:01 NFS delegations behavior analysis Piyush Sachdeva
2026-06-23 10:50 ` Jeff Layton
2026-06-23 11:04   ` Mkrtchyan, Tigran [this message]
2026-06-23 11:10     ` Jeff Layton
2026-06-23 13:11       ` Benjamin Coddington
2026-06-23 13:31         ` Daire Byrne
2026-06-23 13:32         ` Benjamin Coddington
2026-06-23 13:40           ` Jeff Layton
2026-06-23 13:59             ` Benjamin Coddington
2026-06-23 16:29           ` Trond Myklebust
2026-06-23 13:33         ` Jeff Layton
2026-06-23 13:11     ` Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=455619640.1622514.1782212671358.JavaMail.zimbra@desy.de \
    --to=tigran.mkrtchyan@desy.de \
    --cc=cel@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=s.piyush1024@gmail.com \
    --cc=sfrench@samba.org \
    --cc=sprasad@microsoft.com \
    --cc=trondmy@kernel.org \
    --cc=vaibsharma@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox