All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: Jeff Layton <jlayton@kernel.org>
Cc: Piyush Sachdeva <s.piyush1024@gmail.com>,
	 linux-nfs <linux-nfs@vger.kernel.org>,
	Chuck Lever <cel@kernel.org>,  trondmy <trondmy@kernel.org>,
	sfrench@samba.org,  sprasad@microsoft.com,
	vaibsharma@microsoft.com
Subject: Re: NFS delegations behavior analysis
Date: Tue, 23 Jun 2026 13:04:31 +0200 (CEST)	[thread overview]
Message-ID: <455619640.1622514.1782212671358.JavaMail.zimbra@desy.de> (raw)
In-Reply-To: <0b39c1e01a92f99fe456c76523ec7f3aa5dc1a81.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5344 bytes --]



----- Original Message -----
> From: "Jeff Layton" <jlayton@kernel.org>
> To: "Piyush Sachdeva" <s.piyush1024@gmail.com>, "linux-nfs" <linux-nfs@vger.kernel.org>, "Chuck Lever" <cel@kernel.org>,
> "trondmy" <trondmy@kernel.org>, sfrench@samba.org, sprasad@microsoft.com
> Cc: vaibsharma@microsoft.com
> Sent: Tuesday, 23 June, 2026 12:50:16
> Subject: Re: NFS delegations behavior analysis

> On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
>> Hi,
>> Lately I have been running micro benchmarks around the `ls` command and
>> reading through the code documentation of the NFS client to better
>> understand the client side caching behavior with and without
>> delegations.
>> 
>> Understanding so far:
>> Delegations (both file and directory) are granted by the server to the
>> client, indefinitely (until revoked or under the watermark) to cache
>> attributes. The caching of data is a result of the attribute
>> cache. Hence forth, a directory delegation will cache the directory
>> attributes and the names of the files in the directory, and a file
>> delegation will cache the attributes of the file and the file data.
>> 
>> Workload run:
>> I focused on the 2 workloads below, doing 2 passes of a large flat
>> directory (with close to 100K files) -
>> a cold pass, and warm pass using the cache from the cold pass:
>> - lslr - ls -lR on both runs
>> - lsmix - ls -R (cold) and then ls -lR (warm)
>> 
>> I also played with the rdirplus behavior using both the default
>> heuristic behavior and the `rdirplus=force` set at mount time.
>> 
>> Numbers:
>> actimeo=5s, rdirplus=force, ACLs off, flat_dir
>> ==================================================================
>> 
>>                  |         LSLR          |         LSMIX
>>                  |  (ls -lR cold / warm) |  (p1 ls -R / p2 ls -lR)
>> Operation        |  flat cold  | flat warm |   flat p1   | flat p2
>> -----------------+-------------+-----------+-------------+---------
>> READDIR calls    |    27       |     0     |   27        |    0
>> READDIR recv B   | 23,603,024  |     0     | 23,603,024  |    0
>>    call type     | readdirplus |    --     | readdirplus |    --
>> LOOKUP           |     1       |     0     |    1        |    0
>> GETATTR          |     3       |  100,000  |    2        | 100,001
>> ACCESS           |     2       |     0     |    2        |    0
>> -----------------+-------------+-----------+-------------+---------
>> Elapsed (age)    |  ~14 s      |  ~62 s    |   ~16 s     |  ~63 s
>> 
>> 
>> Observations:
>> When doing `ls` or `ls -l` on a directory, due to the open(2) on the
>> directory, the client gets a directory delegation - caching the
>> directory attributes and file names. However, as we don't have file
>> delegations due to no open(2) calls to any of the files. Henceforth,
>> the cache of file attributes is governed by `actimeo`.
>> Now here is the interesting bit, if the next `ls -l` is issued after
>> the `actimeo`, a massive GETATTR storm hits the server, doing stat()
>> calls for every file in the directory. As a result, the performance of
>> this warm `ls -l` run ends up being worse than the cold pass. I am
>> guessing this is most likely due to the compounded "rdirplus" being more
>> efficient than stat() calls.
>> 
>> 
>> Proposal:
>> For large directories, this ends up being a massive problem, taking 1-2
>> minutes when enumerating a directory on the warm passes.
>> - An easier way to tackle this could be to do a rdirplus=[auto | forced]
>>   instead of issuing the stat(2) storm to the server: When the client
>>   notices that there are cache misses, which would be the case of file
>>   attributes, instead of fetching file names from the directory-delegation
>>   cache and attributes from GETATTR, the client does a READDIRPLUS to
>>   the server, nonetheless.
>> - A more tedious would be the to cache file attributes as well, as a part
>>   of the directory delegation. This would end up requiring a change in the
>>   NFS protocol spec though.
>> - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
>>   what if, the client could do 1 GETATTR call for getting attributes
>>   for multiple files.
> 
> 
> ls is such a hard workload to get right, because we don't really get an

100% agree. And there were a couple of attempts to address this issue
(second ls that is slow).

> indication in the kernel of what userland's intentions are. It's
> basically a readdir() call followed by a bunch of stat()'s, but at the
> point where we're getting the readdir() call, we don't know if userland
> intends to stat() those files or not. We have to make a guess about
> that intention.
> 
> In this case, it sounds like the directory cache was valid, so the
> client decided it didn't need to do a READDIR at all, but the
> individual files had caches that timed out.
> 
> So imagine you're the kernel client and have been given that second
> readdir() call: Why should you decide to do a READDIRPLUS at that point
> instead of a regular READDIR?

May we need some kind of client-side heuristics, like on the server side
for open-delegations, where after seeing some `stats` for files in the
In the same directory, the client will decide to switch to READDIR (v4)
to get all attributes in one go.

Best regards,
   Tigran.

> --
> Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2309 bytes --]

  reply	other threads:[~2026-06-23 11:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 10:01 NFS delegations behavior analysis Piyush Sachdeva
2026-06-23 10:50 ` Jeff Layton
2026-06-23 11:04   ` Mkrtchyan, Tigran [this message]
2026-06-23 11:10     ` Jeff Layton
2026-06-23 13:11       ` Benjamin Coddington
2026-06-23 13:31         ` Daire Byrne
2026-06-23 13:32         ` Benjamin Coddington
2026-06-23 13:40           ` Jeff Layton
2026-06-23 13:59             ` Benjamin Coddington
2026-06-23 16:29           ` Trond Myklebust
2026-06-23 13:33         ` Jeff Layton
2026-06-23 13:11     ` Anna Schumaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=455619640.1622514.1782212671358.JavaMail.zimbra@desy.de \
    --to=tigran.mkrtchyan@desy.de \
    --cc=cel@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=s.piyush1024@gmail.com \
    --cc=sfrench@samba.org \
    --cc=sprasad@microsoft.com \
    --cc=trondmy@kernel.org \
    --cc=vaibsharma@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.