* REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success;
@ 2026-04-29 5:02 Igor Raits
2026-04-29 7:12 ` Igor Raits
0 siblings, 1 reply; 4+ messages in thread
From: Igor Raits @ 2026-04-29 5:02 UTC (permalink / raw)
To: NeilBrown, Anna Schumaker, Trond Myklebust, linux-nfs
Cc: Jaroslav Pulchart, Jan Cipa
Hi all,
I think I've run into an NFSv4 client regression and wanted to report
it before I forget the details. Apologies in advance if I'm
mis-reading the code — please correct me if so.
Symptom: an occasional mkdir(2) on an NFSv4 mount returns -EEXIST,
but the directory it was supposed to create is actually present
afterwards. It's reproducible on both NFSv4.0 and NFSv4.2 against an
in-kernel Linux nfsd. Both client and server are running 6.19.14.
Reproducer (random 16-hex names so collisions are not the cause):
N=2000000; base=/var/gdc/export
for ((i=1; i<=N; i++)); do
d=$base/$(openssl rand -hex 8)
mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
rmdir "$d" 2>/dev/null
done
Failures cluster every ~2-3 minutes, and also reliably trigger on the
first mkdir after a few minutes of mount idleness. Each failed mkdir
takes about 100 ms.
strace shows just one syscall, so userspace isn't retrying:
$ strace -ttt -e trace=mkdir mkdir "$dir"
mkdir("/var/gdc/export/954ce422698ef4b1", 0777) = -1 EEXIST (File exists)
+++ exited with 1 +++
A packet capture for one failure (NFSv4.2; the v4.0 capture has the
same shape):
client → server CREATE name=... → NFS4ERR_DELAY (10008)
~100 ms later
client → server CREATE name=... → NFS4_OK ← dir created
~80 µs later
client → server CREATE name=... → NFS4ERR_EXIST (17) ← server is right
Three CREATE RPCs from one mkdir(2). The server looks correct: it
returns DELAY, then OK on the retry, then EXIST when the client asks
again for a name that now exists. The client then surfaces that final
EXIST to userspace even though its own previous retry already
succeeded.
While poking around in fs/nfs/nfs4proc.c I noticed nfs4_proc_mkdir()
looks like this in current master:
do {
alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
trace_nfs4_mkdir(dir, &dentry->d_name, err);
if (err)
alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
err, &exception));
} while (exception.retry);
If I'm reading this right, on a successful retry (err == 0)
nfs4_handle_exception() is skipped, so exception.retry stays at the
value it had after the previous DELAY iteration (which is 1). The
loop then runs once more, sends another CREATE for the same name,
and that one legitimately gets NFS4ERR_EXIST. Other do-while loops
in the same file (e.g. nfs4_proc_symlink) seem to call
nfs4_handle_exception() unconditionally, which would reset
exception.retry to 0 on success and exit the loop.
git blame points at:
dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors
in nfs4_do_mkdir()")
(stable backport: 062feb506caf). The change makes sense in itself —
the goal of returning the int separately from the dentry is good — but
the `if (err)` gate around nfs4_handle_exception() seems to be what
introduced the retry-state issue. I might be wrong about that, though,
so please take it with a grain of salt.
Happy to provide pcaps, more traces, or test a patch if useful.
Reproduces on demand here, so iteration should be quick.
Thanks for all the work on NFS,
Igor
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success; 2026-04-29 5:02 REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success; Igor Raits @ 2026-04-29 7:12 ` Igor Raits 2026-04-29 9:58 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Igor Raits @ 2026-04-29 7:12 UTC (permalink / raw) To: NeilBrown, Anna Schumaker, Trond Myklebust, linux-nfs Cc: Jaroslav Pulchart, Jan Cipa On Wed, Apr 29, 2026 at 7:02 AM Igor Raits <igor@gooddata.com> wrote: > > Hi all, > > I think I've run into an NFSv4 client regression and wanted to report > it before I forget the details. Apologies in advance if I'm > mis-reading the code — please correct me if so. > > Symptom: an occasional mkdir(2) on an NFSv4 mount returns -EEXIST, > but the directory it was supposed to create is actually present > afterwards. It's reproducible on both NFSv4.0 and NFSv4.2 against an > in-kernel Linux nfsd. Both client and server are running 6.19.14. > > Reproducer (random 16-hex names so collisions are not the cause): > > N=2000000; base=/var/gdc/export > for ((i=1; i<=N; i++)); do > d=$base/$(openssl rand -hex 8) > mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d" > rmdir "$d" 2>/dev/null > done > > Failures cluster every ~2-3 minutes, and also reliably trigger on the > first mkdir after a few minutes of mount idleness. Each failed mkdir > takes about 100 ms. > > strace shows just one syscall, so userspace isn't retrying: > > $ strace -ttt -e trace=mkdir mkdir "$dir" > mkdir("/var/gdc/export/954ce422698ef4b1", 0777) = -1 EEXIST (File exists) > +++ exited with 1 +++ > > A packet capture for one failure (NFSv4.2; the v4.0 capture has the > same shape): > > client → server CREATE name=... → NFS4ERR_DELAY (10008) > ~100 ms later > client → server CREATE name=... → NFS4_OK ← dir created > ~80 µs later > client → server CREATE name=... → NFS4ERR_EXIST (17) ← server is right > > Three CREATE RPCs from one mkdir(2). The server looks correct: it > returns DELAY, then OK on the retry, then EXIST when the client asks > again for a name that now exists. The client then surfaces that final > EXIST to userspace even though its own previous retry already > succeeded. > > While poking around in fs/nfs/nfs4proc.c I noticed nfs4_proc_mkdir() > looks like this in current master: > > do { > alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err); > trace_nfs4_mkdir(dir, &dentry->d_name, err); > if (err) > alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), > err, &exception)); > } while (exception.retry); > > If I'm reading this right, on a successful retry (err == 0) > nfs4_handle_exception() is skipped, so exception.retry stays at the > value it had after the previous DELAY iteration (which is 1). The > loop then runs once more, sends another CREATE for the same name, > and that one legitimately gets NFS4ERR_EXIST. Other do-while loops > in the same file (e.g. nfs4_proc_symlink) seem to call > nfs4_handle_exception() unconditionally, which would reset > exception.retry to 0 on success and exit the loop. > > git blame points at: > > dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors > in nfs4_do_mkdir()") > > (stable backport: 062feb506caf). The change makes sense in itself — > the goal of returning the int separately from the dentry is good — but > the `if (err)` gate around nfs4_handle_exception() seems to be what > introduced the retry-state issue. I might be wrong about that, though, > so please take it with a grain of salt. > > Happy to provide pcaps, more traces, or test a patch if useful. > Reproduces on demand here, so iteration should be quick. FTR I have applied following patch and it seems to fix our issue: diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a0885ae55abc..ffd14141ea1d 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry, do { alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err); trace_nfs4_mkdir(dir, &dentry->d_name, err); + err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception); if (err) - alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), - err, - &exception)); + alias = ERR_PTR(err); } while (exception.retry); nfs4_label_release_security(label); ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success; 2026-04-29 7:12 ` Igor Raits @ 2026-04-29 9:58 ` NeilBrown 2026-04-29 10:49 ` [PATCH] NFSv4: clear exception state on successful mkdir retry Igor Raits 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2026-04-29 9:58 UTC (permalink / raw) To: Igor Raits Cc: Anna Schumaker, Trond Myklebust, linux-nfs, Jaroslav Pulchart, Jan Cipa On Wed, 29 Apr 2026, Igor Raits wrote: > On Wed, Apr 29, 2026 at 7:02 AM Igor Raits <igor@gooddata.com> wrote: > > > > Hi all, > > > > I think I've run into an NFSv4 client regression and wanted to report > > it before I forget the details. Apologies in advance if I'm > > mis-reading the code — please correct me if so. > > > > Symptom: an occasional mkdir(2) on an NFSv4 mount returns -EEXIST, > > but the directory it was supposed to create is actually present > > afterwards. It's reproducible on both NFSv4.0 and NFSv4.2 against an > > in-kernel Linux nfsd. Both client and server are running 6.19.14. > > > > Reproducer (random 16-hex names so collisions are not the cause): > > > > N=2000000; base=/var/gdc/export > > for ((i=1; i<=N; i++)); do > > d=$base/$(openssl rand -hex 8) > > mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d" > > rmdir "$d" 2>/dev/null > > done > > > > Failures cluster every ~2-3 minutes, and also reliably trigger on the > > first mkdir after a few minutes of mount idleness. Each failed mkdir > > takes about 100 ms. > > > > strace shows just one syscall, so userspace isn't retrying: > > > > $ strace -ttt -e trace=mkdir mkdir "$dir" > > mkdir("/var/gdc/export/954ce422698ef4b1", 0777) = -1 EEXIST (File exists) > > +++ exited with 1 +++ > > > > A packet capture for one failure (NFSv4.2; the v4.0 capture has the > > same shape): > > > > client → server CREATE name=... → NFS4ERR_DELAY (10008) > > ~100 ms later > > client → server CREATE name=... → NFS4_OK ← dir created > > ~80 µs later > > client → server CREATE name=... → NFS4ERR_EXIST (17) ← server is right > > > > Three CREATE RPCs from one mkdir(2). The server looks correct: it > > returns DELAY, then OK on the retry, then EXIST when the client asks > > again for a name that now exists. The client then surfaces that final > > EXIST to userspace even though its own previous retry already > > succeeded. > > > > While poking around in fs/nfs/nfs4proc.c I noticed nfs4_proc_mkdir() > > looks like this in current master: > > > > do { > > alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err); > > trace_nfs4_mkdir(dir, &dentry->d_name, err); > > if (err) > > alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), > > err, &exception)); > > } while (exception.retry); Oh dear, that was careless of me. We *always* need to call nfs4_handle_exception(). > > > > If I'm reading this right, on a successful retry (err == 0) > > nfs4_handle_exception() is skipped, so exception.retry stays at the > > value it had after the previous DELAY iteration (which is 1). The > > loop then runs once more, sends another CREATE for the same name, > > and that one legitimately gets NFS4ERR_EXIST. Other do-while loops > > in the same file (e.g. nfs4_proc_symlink) seem to call > > nfs4_handle_exception() unconditionally, which would reset > > exception.retry to 0 on success and exit the loop. > > > > git blame points at: > > > > dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors > > in nfs4_do_mkdir()") > > > > (stable backport: 062feb506caf). The change makes sense in itself — > > the goal of returning the int separately from the dentry is good — but > > the `if (err)` gate around nfs4_handle_exception() seems to be what > > introduced the retry-state issue. I might be wrong about that, though, > > so please take it with a grain of salt. > > > > Happy to provide pcaps, more traces, or test a patch if useful. > > Reproduces on demand here, so iteration should be quick. > > > FTR I have applied following patch and it seems to fix our issue: > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index a0885ae55abc..ffd14141ea1d 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct > inode *dir, struct dentry *dentry, > do { > alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err); > trace_nfs4_mkdir(dir, &dentry->d_name, err); > + err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception); > if (err) > - alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), > - err, > - &exception)); > + alias = ERR_PTR(err); > } while (exception.retry); > nfs4_label_release_security(label); > That is exactly the patch I was thinking of when I saw your first email. If you would like to create a properly formatted patch for submission, please add Reviewed-by: NeilBrown <neil@brown.name> If you don't want to, I can do it for you. Thanks for the report. NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] NFSv4: clear exception state on successful mkdir retry 2026-04-29 9:58 ` NeilBrown @ 2026-04-29 10:49 ` Igor Raits 0 siblings, 0 replies; 4+ messages in thread From: Igor Raits @ 2026-04-29 10:49 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker Cc: NeilBrown, Jan Čípa, linux-nfs, linux-kernel, stable After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by mkdir(2), the client correctly waits and retries. When the retry succeeds, however, mkdir(2) can still surface -EEXIST to userspace even though the directory was just created on the server. Reproducer (random 16-hex names so collisions are not the cause) against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and NFSv4.2: N=2000000; base=/var/gdc/export for ((i=1; i<=N; i++)); do d=$base/$(openssl rand -hex 8) mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d" rmdir "$d" 2>/dev/null done Failures cluster at the cadence at which the server-side auth/export cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE. A wire trace of one failure (the three CREATE RPCs all come from a single mkdir(2), generated by the do-while in nfs4_proc_mkdir()): client -> server CREATE name=... -> NFS4ERR_DELAY ~100 ms later client -> server CREATE name=... -> NFS4_OK (dir created) ~80 us later client -> server CREATE name=... -> NFS4ERR_EXIST (correct) Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only when _nfs4_proc_mkdir() returned an error. That gate breaks retry-state hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering, retry} to 0 on entry, so calling it on success is what previously cleared the retry flag set by the preceding NFS4ERR_DELAY iteration. With the gate in place, exception.retry stays at 1 after the successful retry, the loop runs once more, and the resulting CREATE for an already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace. Drop the conditional and call nfs4_handle_exception() unconditionally, matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(), nfs4_proc_link(), etc.). The dentry/status separation introduced by that commit is preserved. Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()") Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com> Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/ Reviewed-by: NeilBrown <neil@brown.name> Cc: stable@vger.kernel.org Signed-off-by: Igor Raits <igor.raits@gmail.com> --- fs/nfs/nfs4proc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a0885ae55abc..ffd14141ea1d 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5393,10 +5393,9 @@ static struct dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry, do { alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err); trace_nfs4_mkdir(dir, &dentry->d_name, err); + err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception); if (err) - alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), - err, - &exception)); + alias = ERR_PTR(err); } while (exception.retry); nfs4_label_release_security(label); -- 2.53.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-29 10:49 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-29 5:02 REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success; Igor Raits 2026-04-29 7:12 ` Igor Raits 2026-04-29 9:58 ` NeilBrown 2026-04-29 10:49 ` [PATCH] NFSv4: clear exception state on successful mkdir retry Igor Raits
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox