* NFS delegations behavior analysis
@ 2026-06-23 10:01 Piyush Sachdeva
2026-06-23 10:50 ` Jeff Layton
0 siblings, 1 reply; 12+ messages in thread
From: Piyush Sachdeva @ 2026-06-23 10:01 UTC (permalink / raw)
To: linux-nfs, cel, trondmy, sfrench, sprasad; +Cc: vaibsharma
Hi,
Lately I have been running micro benchmarks around the `ls` command and
reading through the code documentation of the NFS client to better
understand the client side caching behavior with and without
delegations.
Understanding so far:
Delegations (both file and directory) are granted by the server to the
client, indefinitely (until revoked or under the watermark) to cache
attributes. The caching of data is a result of the attribute
cache. Hence forth, a directory delegation will cache the directory
attributes and the names of the files in the directory, and a file
delegation will cache the attributes of the file and the file data.
Workload run:
I focused on the 2 workloads below, doing 2 passes of a large flat
directory (with close to 100K files) -
a cold pass, and warm pass using the cache from the cold pass:
- lslr - ls -lR on both runs
- lsmix - ls -R (cold) and then ls -lR (warm)
I also played with the rdirplus behavior using both the default
heuristic behavior and the `rdirplus=force` set at mount time.
Numbers:
actimeo=5s, rdirplus=force, ACLs off, flat_dir
==================================================================
| LSLR | LSMIX
| (ls -lR cold / warm) | (p1 ls -R / p2 ls -lR)
Operation | flat cold | flat warm | flat p1 | flat p2
-----------------+-------------+-----------+-------------+---------
READDIR calls | 27 | 0 | 27 | 0
READDIR recv B | 23,603,024 | 0 | 23,603,024 | 0
call type | readdirplus | -- | readdirplus | --
LOOKUP | 1 | 0 | 1 | 0
GETATTR | 3 | 100,000 | 2 | 100,001
ACCESS | 2 | 0 | 2 | 0
-----------------+-------------+-----------+-------------+---------
Elapsed (age) | ~14 s | ~62 s | ~16 s | ~63 s
Observations:
When doing `ls` or `ls -l` on a directory, due to the open(2) on the
directory, the client gets a directory delegation - caching the
directory attributes and file names. However, as we don't have file
delegations due to no open(2) calls to any of the files. Henceforth,
the cache of file attributes is governed by `actimeo`.
Now here is the interesting bit, if the next `ls -l` is issued after
the `actimeo`, a massive GETATTR storm hits the server, doing stat()
calls for every file in the directory. As a result, the performance of
this warm `ls -l` run ends up being worse than the cold pass. I am
guessing this is most likely due to the compounded "rdirplus" being more
efficient than stat() calls.
Proposal:
For large directories, this ends up being a massive problem, taking 1-2
minutes when enumerating a directory on the warm passes.
- An easier way to tackle this could be to do a rdirplus=[auto | forced]
instead of issuing the stat(2) storm to the server: When the client
notices that there are cache misses, which would be the case of file
attributes, instead of fetching file names from the directory-delegation
cache and attributes from GETATTR, the client does a READDIRPLUS to
the server, nonetheless.
- A more tedious would be the to cache file attributes as well, as a part
of the directory delegation. This would end up requiring a change in the
NFS protocol spec though.
- Bulk GETATTR calls: I am uncertain of the feasibility of this, but
what if, the client could do 1 GETATTR call for getting attributes
for multiple files.
--
Piyush
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 10:01 NFS delegations behavior analysis Piyush Sachdeva
@ 2026-06-23 10:50 ` Jeff Layton
2026-06-23 11:04 ` Mkrtchyan, Tigran
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2026-06-23 10:50 UTC (permalink / raw)
To: Piyush Sachdeva, linux-nfs, cel, trondmy, sfrench, sprasad; +Cc: vaibsharma
On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
> Hi,
> Lately I have been running micro benchmarks around the `ls` command and
> reading through the code documentation of the NFS client to better
> understand the client side caching behavior with and without
> delegations.
>
> Understanding so far:
> Delegations (both file and directory) are granted by the server to the
> client, indefinitely (until revoked or under the watermark) to cache
> attributes. The caching of data is a result of the attribute
> cache. Hence forth, a directory delegation will cache the directory
> attributes and the names of the files in the directory, and a file
> delegation will cache the attributes of the file and the file data.
>
> Workload run:
> I focused on the 2 workloads below, doing 2 passes of a large flat
> directory (with close to 100K files) -
> a cold pass, and warm pass using the cache from the cold pass:
> - lslr - ls -lR on both runs
> - lsmix - ls -R (cold) and then ls -lR (warm)
>
> I also played with the rdirplus behavior using both the default
> heuristic behavior and the `rdirplus=force` set at mount time.
>
> Numbers:
> actimeo=5s, rdirplus=force, ACLs off, flat_dir
> ==================================================================
>
> | LSLR | LSMIX
> | (ls -lR cold / warm) | (p1 ls -R / p2 ls -lR)
> Operation | flat cold | flat warm | flat p1 | flat p2
> -----------------+-------------+-----------+-------------+---------
> READDIR calls | 27 | 0 | 27 | 0
> READDIR recv B | 23,603,024 | 0 | 23,603,024 | 0
> call type | readdirplus | -- | readdirplus | --
> LOOKUP | 1 | 0 | 1 | 0
> GETATTR | 3 | 100,000 | 2 | 100,001
> ACCESS | 2 | 0 | 2 | 0
> -----------------+-------------+-----------+-------------+---------
> Elapsed (age) | ~14 s | ~62 s | ~16 s | ~63 s
>
>
> Observations:
> When doing `ls` or `ls -l` on a directory, due to the open(2) on the
> directory, the client gets a directory delegation - caching the
> directory attributes and file names. However, as we don't have file
> delegations due to no open(2) calls to any of the files. Henceforth,
> the cache of file attributes is governed by `actimeo`.
> Now here is the interesting bit, if the next `ls -l` is issued after
> the `actimeo`, a massive GETATTR storm hits the server, doing stat()
> calls for every file in the directory. As a result, the performance of
> this warm `ls -l` run ends up being worse than the cold pass. I am
> guessing this is most likely due to the compounded "rdirplus" being more
> efficient than stat() calls.
>
>
> Proposal:
> For large directories, this ends up being a massive problem, taking 1-2
> minutes when enumerating a directory on the warm passes.
> - An easier way to tackle this could be to do a rdirplus=[auto | forced]
> instead of issuing the stat(2) storm to the server: When the client
> notices that there are cache misses, which would be the case of file
> attributes, instead of fetching file names from the directory-delegation
> cache and attributes from GETATTR, the client does a READDIRPLUS to
> the server, nonetheless.
> - A more tedious would be the to cache file attributes as well, as a part
> of the directory delegation. This would end up requiring a change in the
> NFS protocol spec though.
> - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
> what if, the client could do 1 GETATTR call for getting attributes
> for multiple files.
ls is such a hard workload to get right, because we don't really get an
indication in the kernel of what userland's intentions are. It's
basically a readdir() call followed by a bunch of stat()'s, but at the
point where we're getting the readdir() call, we don't know if userland
intends to stat() those files or not. We have to make a guess about
that intention.
In this case, it sounds like the directory cache was valid, so the
client decided it didn't need to do a READDIR at all, but the
individual files had caches that timed out.
So imagine you're the kernel client and have been given that second
readdir() call: Why should you decide to do a READDIRPLUS at that point
instead of a regular READDIR?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 10:50 ` Jeff Layton
@ 2026-06-23 11:04 ` Mkrtchyan, Tigran
2026-06-23 11:10 ` Jeff Layton
2026-06-23 13:11 ` Anna Schumaker
0 siblings, 2 replies; 12+ messages in thread
From: Mkrtchyan, Tigran @ 2026-06-23 11:04 UTC (permalink / raw)
To: Jeff Layton
Cc: Piyush Sachdeva, linux-nfs, Chuck Lever, trondmy, sfrench,
sprasad, vaibsharma
[-- Attachment #1: Type: text/plain, Size: 5344 bytes --]
----- Original Message -----
> From: "Jeff Layton" <jlayton@kernel.org>
> To: "Piyush Sachdeva" <s.piyush1024@gmail.com>, "linux-nfs" <linux-nfs@vger.kernel.org>, "Chuck Lever" <cel@kernel.org>,
> "trondmy" <trondmy@kernel.org>, sfrench@samba.org, sprasad@microsoft.com
> Cc: vaibsharma@microsoft.com
> Sent: Tuesday, 23 June, 2026 12:50:16
> Subject: Re: NFS delegations behavior analysis
> On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
>> Hi,
>> Lately I have been running micro benchmarks around the `ls` command and
>> reading through the code documentation of the NFS client to better
>> understand the client side caching behavior with and without
>> delegations.
>>
>> Understanding so far:
>> Delegations (both file and directory) are granted by the server to the
>> client, indefinitely (until revoked or under the watermark) to cache
>> attributes. The caching of data is a result of the attribute
>> cache. Hence forth, a directory delegation will cache the directory
>> attributes and the names of the files in the directory, and a file
>> delegation will cache the attributes of the file and the file data.
>>
>> Workload run:
>> I focused on the 2 workloads below, doing 2 passes of a large flat
>> directory (with close to 100K files) -
>> a cold pass, and warm pass using the cache from the cold pass:
>> - lslr - ls -lR on both runs
>> - lsmix - ls -R (cold) and then ls -lR (warm)
>>
>> I also played with the rdirplus behavior using both the default
>> heuristic behavior and the `rdirplus=force` set at mount time.
>>
>> Numbers:
>> actimeo=5s, rdirplus=force, ACLs off, flat_dir
>> ==================================================================
>>
>> | LSLR | LSMIX
>> | (ls -lR cold / warm) | (p1 ls -R / p2 ls -lR)
>> Operation | flat cold | flat warm | flat p1 | flat p2
>> -----------------+-------------+-----------+-------------+---------
>> READDIR calls | 27 | 0 | 27 | 0
>> READDIR recv B | 23,603,024 | 0 | 23,603,024 | 0
>> call type | readdirplus | -- | readdirplus | --
>> LOOKUP | 1 | 0 | 1 | 0
>> GETATTR | 3 | 100,000 | 2 | 100,001
>> ACCESS | 2 | 0 | 2 | 0
>> -----------------+-------------+-----------+-------------+---------
>> Elapsed (age) | ~14 s | ~62 s | ~16 s | ~63 s
>>
>>
>> Observations:
>> When doing `ls` or `ls -l` on a directory, due to the open(2) on the
>> directory, the client gets a directory delegation - caching the
>> directory attributes and file names. However, as we don't have file
>> delegations due to no open(2) calls to any of the files. Henceforth,
>> the cache of file attributes is governed by `actimeo`.
>> Now here is the interesting bit, if the next `ls -l` is issued after
>> the `actimeo`, a massive GETATTR storm hits the server, doing stat()
>> calls for every file in the directory. As a result, the performance of
>> this warm `ls -l` run ends up being worse than the cold pass. I am
>> guessing this is most likely due to the compounded "rdirplus" being more
>> efficient than stat() calls.
>>
>>
>> Proposal:
>> For large directories, this ends up being a massive problem, taking 1-2
>> minutes when enumerating a directory on the warm passes.
>> - An easier way to tackle this could be to do a rdirplus=[auto | forced]
>> instead of issuing the stat(2) storm to the server: When the client
>> notices that there are cache misses, which would be the case of file
>> attributes, instead of fetching file names from the directory-delegation
>> cache and attributes from GETATTR, the client does a READDIRPLUS to
>> the server, nonetheless.
>> - A more tedious would be the to cache file attributes as well, as a part
>> of the directory delegation. This would end up requiring a change in the
>> NFS protocol spec though.
>> - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
>> what if, the client could do 1 GETATTR call for getting attributes
>> for multiple files.
>
>
> ls is such a hard workload to get right, because we don't really get an
100% agree. And there were a couple of attempts to address this issue
(second ls that is slow).
> indication in the kernel of what userland's intentions are. It's
> basically a readdir() call followed by a bunch of stat()'s, but at the
> point where we're getting the readdir() call, we don't know if userland
> intends to stat() those files or not. We have to make a guess about
> that intention.
>
> In this case, it sounds like the directory cache was valid, so the
> client decided it didn't need to do a READDIR at all, but the
> individual files had caches that timed out.
>
> So imagine you're the kernel client and have been given that second
> readdir() call: Why should you decide to do a READDIRPLUS at that point
> instead of a regular READDIR?
May we need some kind of client-side heuristics, like on the server side
for open-delegations, where after seeing some `stats` for files in the
In the same directory, the client will decide to switch to READDIR (v4)
to get all attributes in one go.
Best regards,
Tigran.
> --
> Jeff Layton <jlayton@kernel.org>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2309 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 11:04 ` Mkrtchyan, Tigran
@ 2026-06-23 11:10 ` Jeff Layton
2026-06-23 13:11 ` Benjamin Coddington
2026-06-23 13:11 ` Anna Schumaker
1 sibling, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2026-06-23 11:10 UTC (permalink / raw)
To: Mkrtchyan, Tigran, Benjamin Coddington
Cc: Piyush Sachdeva, linux-nfs, Chuck Lever, trondmy, sfrench,
sprasad, vaibsharma
On Tue, 2026-06-23 at 13:04 +0200, Mkrtchyan, Tigran wrote:
>
> ----- Original Message -----
> > From: "Jeff Layton" <jlayton@kernel.org>
> > To: "Piyush Sachdeva" <s.piyush1024@gmail.com>, "linux-nfs" <linux-nfs@vger.kernel.org>, "Chuck Lever" <cel@kernel.org>,
> > "trondmy" <trondmy@kernel.org>, sfrench@samba.org, sprasad@microsoft.com
> > Cc: vaibsharma@microsoft.com
> > Sent: Tuesday, 23 June, 2026 12:50:16
> > Subject: Re: NFS delegations behavior analysis
>
> > On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
> > > Hi,
> > > Lately I have been running micro benchmarks around the `ls` command and
> > > reading through the code documentation of the NFS client to better
> > > understand the client side caching behavior with and without
> > > delegations.
> > >
> > > Understanding so far:
> > > Delegations (both file and directory) are granted by the server to the
> > > client, indefinitely (until revoked or under the watermark) to cache
> > > attributes. The caching of data is a result of the attribute
> > > cache. Hence forth, a directory delegation will cache the directory
> > > attributes and the names of the files in the directory, and a file
> > > delegation will cache the attributes of the file and the file data.
> > >
> > > Workload run:
> > > I focused on the 2 workloads below, doing 2 passes of a large flat
> > > directory (with close to 100K files) -
> > > a cold pass, and warm pass using the cache from the cold pass:
> > > - lslr - ls -lR on both runs
> > > - lsmix - ls -R (cold) and then ls -lR (warm)
> > >
> > > I also played with the rdirplus behavior using both the default
> > > heuristic behavior and the `rdirplus=force` set at mount time.
> > >
> > > Numbers:
> > > actimeo=5s, rdirplus=force, ACLs off, flat_dir
> > > ==================================================================
> > >
> > > | LSLR | LSMIX
> > > | (ls -lR cold / warm) | (p1 ls -R / p2 ls -lR)
> > > Operation | flat cold | flat warm | flat p1 | flat p2
> > > -----------------+-------------+-----------+-------------+---------
> > > READDIR calls | 27 | 0 | 27 | 0
> > > READDIR recv B | 23,603,024 | 0 | 23,603,024 | 0
> > > call type | readdirplus | -- | readdirplus | --
> > > LOOKUP | 1 | 0 | 1 | 0
> > > GETATTR | 3 | 100,000 | 2 | 100,001
> > > ACCESS | 2 | 0 | 2 | 0
> > > -----------------+-------------+-----------+-------------+---------
> > > Elapsed (age) | ~14 s | ~62 s | ~16 s | ~63 s
> > >
> > >
> > > Observations:
> > > When doing `ls` or `ls -l` on a directory, due to the open(2) on the
> > > directory, the client gets a directory delegation - caching the
> > > directory attributes and file names. However, as we don't have file
> > > delegations due to no open(2) calls to any of the files. Henceforth,
> > > the cache of file attributes is governed by `actimeo`.
> > > Now here is the interesting bit, if the next `ls -l` is issued after
> > > the `actimeo`, a massive GETATTR storm hits the server, doing stat()
> > > calls for every file in the directory. As a result, the performance of
> > > this warm `ls -l` run ends up being worse than the cold pass. I am
> > > guessing this is most likely due to the compounded "rdirplus" being more
> > > efficient than stat() calls.
> > >
> > >
> > > Proposal:
> > > For large directories, this ends up being a massive problem, taking 1-2
> > > minutes when enumerating a directory on the warm passes.
> > > - An easier way to tackle this could be to do a rdirplus=[auto | forced]
> > > instead of issuing the stat(2) storm to the server: When the client
> > > notices that there are cache misses, which would be the case of file
> > > attributes, instead of fetching file names from the directory-delegation
> > > cache and attributes from GETATTR, the client does a READDIRPLUS to
> > > the server, nonetheless.
> > > - A more tedious would be the to cache file attributes as well, as a part
> > > of the directory delegation. This would end up requiring a change in the
> > > NFS protocol spec though.
> > > - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
> > > what if, the client could do 1 GETATTR call for getting attributes
> > > for multiple files.
> >
> >
> > ls is such a hard workload to get right, because we don't really get an
>
> 100% agree. And there were a couple of attempts to address this issue
> (second ls that is slow).
>
> > indication in the kernel of what userland's intentions are. It's
> > basically a readdir() call followed by a bunch of stat()'s, but at the
> > point where we're getting the readdir() call, we don't know if userland
> > intends to stat() those files or not. We have to make a guess about
> > that intention.
> >
> > In this case, it sounds like the directory cache was valid, so the
> > client decided it didn't need to do a READDIR at all, but the
> > individual files had caches that timed out.
> >
> > So imagine you're the kernel client and have been given that second
> > readdir() call: Why should you decide to do a READDIRPLUS at that point
> > instead of a regular READDIR?
>
> May we need some kind of client-side heuristics, like on the server side
> for open-delegations, where after seeing some `stats` for files in the
> In the same directory, the client will decide to switch to READDIR (v4)
> to get all attributes in one go.
>
Yeah, we definitely do. I'm just not sure what those heuristics look
like.
I think Ben did the latest pass of trying to tune the heuristics here.
Any thoughts on how we could do this better (and whether there are
particular ls-ish workloads that we don't want to regress)?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 11:04 ` Mkrtchyan, Tigran
2026-06-23 11:10 ` Jeff Layton
@ 2026-06-23 13:11 ` Anna Schumaker
1 sibling, 0 replies; 12+ messages in thread
From: Anna Schumaker @ 2026-06-23 13:11 UTC (permalink / raw)
To: Tigran Mkrtchyan, Jeff Layton
Cc: Piyush Sachdeva, linux-nfs, Chuck Lever, Trond Myklebust, sfrench,
sprasad, vaibsharma
On Tue, Jun 23, 2026, at 7:04 AM, Mkrtchyan, Tigran wrote:
> ----- Original Message -----
>> From: "Jeff Layton" <jlayton@kernel.org>
>> To: "Piyush Sachdeva" <s.piyush1024@gmail.com>, "linux-nfs" <linux-nfs@vger.kernel.org>, "Chuck Lever" <cel@kernel.org>,
>> "trondmy" <trondmy@kernel.org>, sfrench@samba.org, sprasad@microsoft.com
>> Cc: vaibsharma@microsoft.com
>> Sent: Tuesday, 23 June, 2026 12:50:16
>> Subject: Re: NFS delegations behavior analysis
>
>> On Tue, 2026-06-23 at 15:31 +0530, Piyush Sachdeva wrote:
>>> Hi,
>>> Lately I have been running micro benchmarks around the `ls` command and
>>> reading through the code documentation of the NFS client to better
>>> understand the client side caching behavior with and without
>>> delegations.
>>>
>>> Understanding so far:
>>> Delegations (both file and directory) are granted by the server to the
>>> client, indefinitely (until revoked or under the watermark) to cache
>>> attributes. The caching of data is a result of the attribute
>>> cache. Hence forth, a directory delegation will cache the directory
>>> attributes and the names of the files in the directory, and a file
>>> delegation will cache the attributes of the file and the file data.
>>>
>>> Workload run:
>>> I focused on the 2 workloads below, doing 2 passes of a large flat
>>> directory (with close to 100K files) -
>>> a cold pass, and warm pass using the cache from the cold pass:
>>> - lslr - ls -lR on both runs
>>> - lsmix - ls -R (cold) and then ls -lR (warm)
>>>
>>> I also played with the rdirplus behavior using both the default
>>> heuristic behavior and the `rdirplus=force` set at mount time.
>>>
>>> Numbers:
>>> actimeo=5s, rdirplus=force, ACLs off, flat_dir
>>> ==================================================================
>>>
>>> | LSLR | LSMIX
>>> | (ls -lR cold / warm) | (p1 ls -R / p2 ls -lR)
>>> Operation | flat cold | flat warm | flat p1 | flat p2
>>> -----------------+-------------+-----------+-------------+---------
>>> READDIR calls | 27 | 0 | 27 | 0
>>> READDIR recv B | 23,603,024 | 0 | 23,603,024 | 0
>>> call type | readdirplus | -- | readdirplus | --
>>> LOOKUP | 1 | 0 | 1 | 0
>>> GETATTR | 3 | 100,000 | 2 | 100,001
>>> ACCESS | 2 | 0 | 2 | 0
>>> -----------------+-------------+-----------+-------------+---------
>>> Elapsed (age) | ~14 s | ~62 s | ~16 s | ~63 s
>>>
>>>
>>> Observations:
>>> When doing `ls` or `ls -l` on a directory, due to the open(2) on the
>>> directory, the client gets a directory delegation - caching the
>>> directory attributes and file names. However, as we don't have file
>>> delegations due to no open(2) calls to any of the files. Henceforth,
>>> the cache of file attributes is governed by `actimeo`.
>>> Now here is the interesting bit, if the next `ls -l` is issued after
>>> the `actimeo`, a massive GETATTR storm hits the server, doing stat()
>>> calls for every file in the directory. As a result, the performance of
>>> this warm `ls -l` run ends up being worse than the cold pass. I am
>>> guessing this is most likely due to the compounded "rdirplus" being more
>>> efficient than stat() calls.
>>>
>>>
>>> Proposal:
>>> For large directories, this ends up being a massive problem, taking 1-2
>>> minutes when enumerating a directory on the warm passes.
>>> - An easier way to tackle this could be to do a rdirplus=[auto | forced]
>>> instead of issuing the stat(2) storm to the server: When the client
>>> notices that there are cache misses, which would be the case of file
>>> attributes, instead of fetching file names from the directory-delegation
>>> cache and attributes from GETATTR, the client does a READDIRPLUS to
>>> the server, nonetheless.
>>> - A more tedious would be the to cache file attributes as well, as a part
>>> of the directory delegation. This would end up requiring a change in the
>>> NFS protocol spec though.
>>> - Bulk GETATTR calls: I am uncertain of the feasibility of this, but
>>> what if, the client could do 1 GETATTR call for getting attributes
>>> for multiple files.
>>
>>
>> ls is such a hard workload to get right, because we don't really get an
>
> 100% agree. And there were a couple of attempts to address this issue
> (second ls that is slow).
>
>> indication in the kernel of what userland's intentions are. It's
>> basically a readdir() call followed by a bunch of stat()'s, but at the
>> point where we're getting the readdir() call, we don't know if userland
>> intends to stat() those files or not. We have to make a guess about
>> that intention.
>>
>> In this case, it sounds like the directory cache was valid, so the
>> client decided it didn't need to do a READDIR at all, but the
>> individual files had caches that timed out.
>>
>> So imagine you're the kernel client and have been given that second
>> readdir() call: Why should you decide to do a READDIRPLUS at that point
>> instead of a regular READDIR?
>
> May we need some kind of client-side heuristics, like on the server side
> for open-delegations, where after seeing some `stats` for files in the
> In the same directory, the client will decide to switch to READDIR (v4)
> to get all attributes in one go.
We do something like that already. I don't think we'll ever have a readdir
plus heuristic that makes everybody happy without userspace somehow telling
us their intentions. I know a statx-based readdir plus system call probably
sounds crazy, but something like that would go a long way to take all the
guesswork out of things on our end.
Anna
>
> Best regards,
> Tigran.
>
>> --
>> Jeff Layton <jlayton@kernel.org>
>
> Attachments:
> * smime.p7s
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 11:10 ` Jeff Layton
@ 2026-06-23 13:11 ` Benjamin Coddington
2026-06-23 13:31 ` Daire Byrne
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Benjamin Coddington @ 2026-06-23 13:11 UTC (permalink / raw)
To: Jeff Layton
Cc: Mkrtchyan, Tigran, Benjamin Coddington, Piyush Sachdeva,
linux-nfs, Chuck Lever, trondmy, sfrench, sprasad, vaibsharma
On 23 Jun 2026, at 7:10, Jeff Layton wrote:
> I think Ben did the latest pass of trying to tune the heuristics here.
> Any thoughts on how we could do this better (and whether there are
> particular ls-ish workloads that we don't want to regress)?
I haven't (shame) thought about READDIR in the context of directory
delegations.
But during my time at Red Hat we worked hard to optimize some readdir
problems and I learned that almost any change we made ended up making
someone's workload regress. We also found that our performance benchmarks
rarely matched the most common real-world workloads. We made the mistake of
trying to improve the benchmark which resulted in performance regressions
for real-world users.
Jeff, you've already touched on the core issue regarding fixing this with
bulk GETATTR calls - the kernel doesn't know what syscall pattern the
userspace process is going to use next. The `ls -l` command and `find` and
friends have complex history and branching logic, they do different lookup
and getattr patterns based on their own goals, and NFS cannot optimize for
any one case.
I think the last time we discussed additional improvements there were some
ideas about teaching the readdir code to respond to fadvise flags, but then
you'd also need to teach the utilities how to use them as well, and those
utilities try to be filesystem-agnostic.
Its a tough problem, and sometimes the simplest thing might be to just use
more directories on NFS.
**coffee!**
.... er - so with directory delegations, can we simply re-hydrate the dentry
cache from the directory page mappings if the delegation is still valid?
Does the directory delegation pin the mapping? Clearly I need to look at
the code..
Ben
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:11 ` Benjamin Coddington
@ 2026-06-23 13:31 ` Daire Byrne
2026-06-23 13:32 ` Benjamin Coddington
2026-06-23 13:33 ` Jeff Layton
2 siblings, 0 replies; 12+ messages in thread
From: Daire Byrne @ 2026-06-23 13:31 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Jeff Layton, Mkrtchyan, Tigran, Piyush Sachdeva, linux-nfs,
Chuck Lever, trondmy, sfrench, sprasad, vaibsharma
On Tue, 23 Jun 2026 at 14:12, Benjamin Coddington
<ben.coddington@hammerspace.com> wrote:
>
> On 23 Jun 2026, at 7:10, Jeff Layton wrote:
>
> > I think Ben did the latest pass of trying to tune the heuristics here.
> > Any thoughts on how we could do this better (and whether there are
> > particular ls-ish workloads that we don't want to regress)?
>
> I haven't (shame) thought about READDIR in the context of directory
> delegations.
>
> But during my time at Red Hat we worked hard to optimize some readdir
> problems and I learned that almost any change we made ended up making
> someone's workload regress. We also found that our performance benchmarks
> rarely matched the most common real-world workloads. We made the mistake of
> trying to improve the benchmark which resulted in performance regressions
> for real-world users.
>
> Jeff, you've already touched on the core issue regarding fixing this with
> bulk GETATTR calls - the kernel doesn't know what syscall pattern the
> userspace process is going to use next. The `ls -l` command and `find` and
> friends have complex history and branching logic, they do different lookup
> and getattr patterns based on their own goals, and NFS cannot optimize for
> any one case.
>
> I think the last time we discussed additional improvements there were some
> ideas about teaching the readdir code to respond to fadvise flags, but then
> you'd also need to teach the utilities how to use them as well, and those
> utilities try to be filesystem-agnostic.
>
> Its a tough problem, and sometimes the simplest thing might be to just use
> more directories on NFS.
>
> **coffee!**
>
> .... er - so with directory delegations, can we simply re-hydrate the dentry
> cache from the directory page mappings if the delegation is still valid?
> Does the directory delegation pin the mapping? Clearly I need to look at
> the code..
>
> Ben
>
I have also long hoped for a way to cache or speed up "negative"
lookups using something akin to directory delegations.
So if you have your software (e.g. PYTHONPATH) on an NFS share, then
negative lookups are a huge scalability problem. It would be nice if a
client that got the directory contents could also serve negative
lookups until the server told it that something had changed.
But as has been explained to me multiple times before, there is no
mechanism to say that a client "has the complete cache of directory
contents". But I will forever live in hope!
This kind of path search walking is also something that is unlikely to
ever trigger readdirplus as it does a file and then tries the same
file in another directory etc etc.
Daire
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:11 ` Benjamin Coddington
2026-06-23 13:31 ` Daire Byrne
@ 2026-06-23 13:32 ` Benjamin Coddington
2026-06-23 13:40 ` Jeff Layton
2026-06-23 16:29 ` Trond Myklebust
2026-06-23 13:33 ` Jeff Layton
2 siblings, 2 replies; 12+ messages in thread
From: Benjamin Coddington @ 2026-06-23 13:32 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Jeff Layton, Mkrtchyan, Tigran, Piyush Sachdeva, linux-nfs,
Chuck Lever, trondmy, sfrench, sprasad, vaibsharma
On 23 Jun 2026, at 9:11, Benjamin Coddington wrote:
> .... er - so with directory delegations, can we simply re-hydrate the dentry
> cache from the directory page mappings if the delegation is still valid?
> Does the directory delegation pin the mapping? Clearly I need to look at
> the code..
.. right - we don't keep the file attributes in the mappings today. And,
more to the point - the directory delegation doesn't protect those file
attributes either. We'd need NOTIFY4_CHANGE_CHILD_ATTRIBUTES implemented.
Ben
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:11 ` Benjamin Coddington
2026-06-23 13:31 ` Daire Byrne
2026-06-23 13:32 ` Benjamin Coddington
@ 2026-06-23 13:33 ` Jeff Layton
2 siblings, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2026-06-23 13:33 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Mkrtchyan, Tigran, Piyush Sachdeva, linux-nfs, Chuck Lever,
trondmy, sfrench, sprasad, vaibsharma
On Tue, 2026-06-23 at 09:11 -0400, Benjamin Coddington wrote:
> On 23 Jun 2026, at 7:10, Jeff Layton wrote:
>
> > I think Ben did the latest pass of trying to tune the heuristics here.
> > Any thoughts on how we could do this better (and whether there are
> > particular ls-ish workloads that we don't want to regress)?
>
> I haven't (shame) thought about READDIR in the context of directory
> delegations.
>
> But during my time at Red Hat we worked hard to optimize some readdir
> problems and I learned that almost any change we made ended up making
> someone's workload regress. We also found that our performance benchmarks
> rarely matched the most common real-world workloads. We made the mistake of
> trying to improve the benchmark which resulted in performance regressions
> for real-world users.
>
> Jeff, you've already touched on the core issue regarding fixing this with
> bulk GETATTR calls - the kernel doesn't know what syscall pattern the
> userspace process is going to use next. The `ls -l` command and `find` and
> friends have complex history and branching logic, they do different lookup
> and getattr patterns based on their own goals, and NFS cannot optimize for
> any one case.
>
> I think the last time we discussed additional improvements there were some
> ideas about teaching the readdir code to respond to fadvise flags, but then
> you'd also need to teach the utilities how to use them as well, and those
> utilities try to be filesystem-agnostic.
>
> Its a tough problem, and sometimes the simplest thing might be to just use
> more directories on NFS.
>
> **coffee!**
>
> .... er - so with directory delegations, can we simply re-hydrate the dentry
> cache from the directory page mappings if the delegation is still valid?
> Does the directory delegation pin the mapping? Clearly I need to look at
> the code..
>
I think the main problem is not the dcache, but the attributes on the
inodes. The first pass uses a full-attrs READDIR (aka readdirplus) and
everything goes reasonably quick.
Then, on the second pass, the client skips checking directory
attributes before trusting the dcache because of the delegation (which
is good), but the inodes those dentries point to have an attrcache
timeout, and we end up doing a GETATTR for each statx() call when a
readdirplus would have been cheaper.
I guess after you see a few statx() calls from the same pid on the same
directory in a short timeframe you could switch to doing READDIR, but
that sounds horrid to get right and keep working.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:32 ` Benjamin Coddington
@ 2026-06-23 13:40 ` Jeff Layton
2026-06-23 13:59 ` Benjamin Coddington
2026-06-23 16:29 ` Trond Myklebust
1 sibling, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2026-06-23 13:40 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Mkrtchyan, Tigran, Piyush Sachdeva, linux-nfs, Chuck Lever,
trondmy, sfrench, sprasad, vaibsharma
On Tue, 2026-06-23 at 09:32 -0400, Benjamin Coddington wrote:
> On 23 Jun 2026, at 9:11, Benjamin Coddington wrote:
>
> > .... er - so with directory delegations, can we simply re-hydrate the dentry
> > cache from the directory page mappings if the delegation is still valid?
> > Does the directory delegation pin the mapping? Clearly I need to look at
> > the code..
>
> .. right - we don't keep the file attributes in the mappings today. And,
> more to the point - the directory delegation doesn't protect those file
> attributes either. We'd need NOTIFY4_CHANGE_CHILD_ATTRIBUTES implemented.
>
Which I think could be done, at least on the Linux server. fsnotify
does support watching for child attribute changes (FS_EVENT_ON_CHILD |
FS_ATTRIB).
That could be very chatty though. We would need to do the work to make
nfsd send callbacks >1 page in order to keep up, I imagine.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:40 ` Jeff Layton
@ 2026-06-23 13:59 ` Benjamin Coddington
0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Coddington @ 2026-06-23 13:59 UTC (permalink / raw)
To: Jeff Layton
Cc: Benjamin Coddington, Mkrtchyan, Tigran, Piyush Sachdeva,
linux-nfs, Chuck Lever, trondmy, sfrench, sprasad, vaibsharma
On 23 Jun 2026, at 9:40, Jeff Layton wrote:
> On Tue, 2026-06-23 at 09:32 -0400, Benjamin Coddington wrote:
>> On 23 Jun 2026, at 9:11, Benjamin Coddington wrote:
>>
>>> .... er - so with directory delegations, can we simply re-hydrate the dentry
>>> cache from the directory page mappings if the delegation is still valid?
>>> Does the directory delegation pin the mapping? Clearly I need to look at
>>> the code..
>>
>> .. right - we don't keep the file attributes in the mappings today. And,
>> more to the point - the directory delegation doesn't protect those file
>> attributes either. We'd need NOTIFY4_CHANGE_CHILD_ATTRIBUTES implemented.
>>
>
> Which I think could be done, at least on the Linux server. fsnotify
> does support watching for child attribute changes (FS_EVENT_ON_CHILD |
> FS_ATTRIB).
>
> That could be very chatty though. We would need to do the work to make
> nfsd send callbacks >1 page in order to keep up, I imagine.
I agree - it could easily end up having the opposite effect of optimizing
performance by amplifying op counts for information the client might not
actually use.
Ben
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS delegations behavior analysis
2026-06-23 13:32 ` Benjamin Coddington
2026-06-23 13:40 ` Jeff Layton
@ 2026-06-23 16:29 ` Trond Myklebust
1 sibling, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2026-06-23 16:29 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Jeff Layton, Mkrtchyan, Tigran, Piyush Sachdeva, linux-nfs,
Chuck Lever, sfrench, sprasad, vaibsharma
On Tue, 2026-06-23 at 09:32 -0400, Benjamin Coddington wrote:
> On 23 Jun 2026, at 9:11, Benjamin Coddington wrote:
>
> > .... er - so with directory delegations, can we simply re-hydrate
> > the dentry
> > cache from the directory page mappings if the delegation is still
> > valid?
> > Does the directory delegation pin the mapping? Clearly I need to
> > look at
> > the code..
>
> .. right - we don't keep the file attributes in the mappings today.
> And,
> more to the point - the directory delegation doesn't protect those
> file
> attributes either. We'd need NOTIFY4_CHANGE_CHILD_ATTRIBUTES
> implemented.
>
> Ben
Unlike delegations, notifications are asynchronous by design. Upon
reception of a notification, you may deduce that you should revalidate
the attributes on a file. However the absence of that notification is
insufficient to deduce that it is safe not to revalidate those
attributes.
So when it comes to informing the client whether to use READDIR or
READDIRPLUS, I'm sceptical concerning the value of throwing
NOTIFY4_CHANGE_CHILD_ATTRS at the problem.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-06-23 16:29 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 10:01 NFS delegations behavior analysis Piyush Sachdeva
2026-06-23 10:50 ` Jeff Layton
2026-06-23 11:04 ` Mkrtchyan, Tigran
2026-06-23 11:10 ` Jeff Layton
2026-06-23 13:11 ` Benjamin Coddington
2026-06-23 13:31 ` Daire Byrne
2026-06-23 13:32 ` Benjamin Coddington
2026-06-23 13:40 ` Jeff Layton
2026-06-23 13:59 ` Benjamin Coddington
2026-06-23 16:29 ` Trond Myklebust
2026-06-23 13:33 ` Jeff Layton
2026-06-23 13:11 ` Anna Schumaker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox